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ABSTRACT 

In this study the researcher investigated the 
feasibility of constructing a valid paper-and«pei\cil measure of 
problem solving ability. (Rationale and design of the study are 
discussed in Part 1.) The principal feasibility criterion, 
correlation of at least .71 with scores on taped and coded individual 
••thinking aloud^« problem-solving sessions, was not met; however, the 
obtained correlation (.68) for one test suggested to the researcher 
that more reliable tests inight achieve the criterion. Rank ordering 
of subjects on the "thinking aloud'^ procedure and written tests were 
highly correlated. The use of the ••thinking aloud«« procedure to 
establish concurrent validity was evaluated and questions about the 
validity of this procedure with seventh-grade students were raised. 
Investigations of the functional differences between audiotaped and 
videotaped interviews revealed no differences in subject performance, 
but supported the superiority of videotaping as a research tool. 
Instruments used in the study and data displays are presented in 
appendices to this report. (SD) 



Technical Report No. 306 (Part 2 of 2 Parts) 

AN EXPLORATORY STUDY TO COMPARE 
TWO PERFORMANCE MEASURES: 
AN INTERVIEW-CODING SCHEME OF MATHEMATICAL 
PROBLEM SOLVING AND A WRITTEN TEST 

Report from the Project on Conditions of 
School Learning and Instructional Strategies 

By Donald L. Zalewskl 



Thomas A Romberg and John G. Harvey 
Principal Investigators 



Wisconsin Research and Development 
Center for Cognitive Learning 
The University of Wisconsin 
Madison » Wisconsin 

August 1974 



t 



Published by the Wisconsin Research and Development Center for Cognitive Learninp;, 
supported in part as a research and development center by funds from the National 
Institute of Education, Department of Health, Education, and Welfare. The opinions 
expressed herein do not necessarily reflect the position or policy of the National 
Institute Of Education and no official endorsement by that agency should be inferrcni. 

Center Contract No. NE-C-00-3-0065 



ii 



I 



t 



STATEMENT OP FOCUS 



Individually Guided Education (IGE) is a new comprehensive 
system of elementary education. The following components of the 
IGE system are in varying stages of development and implementation: 
a new organization for instruction and related administrative 
arrangements; a model of instructional programing for the indi- 
vidual student; and curriculum components in prereading, reading, 
mathematics, motivation, and environmental education. The develop- 
ment of other curriculum components, of a system for managing in- 
struction by computer, and of instructional strategies is needed 
to complete the system. Continuing programmatic research is required 
to provide a sound knowledge base for the components under develop- 
ment and for improved second generation components. Finally, sys- 
tematic implementation is essential so that the products will function 
properly in the IGE schools. 

The Center plans and carries out the research, development, 
and implementation components of its IGE program in this sequence: 

(1) identify the needs and delimit the component problem area; 

(2) assess the possible constraints — financial resources and avail- 
ability of staff; (3) formulate general plans and specific procedures 
for solving the problems; (A) secure and allocate human and material 
resources to carry out the plans; (5) provide for effective communi- 
cation among personnel and efficient management of activities and 
resources; and (6) evaluate the effectiveness of each activity and 
its contribution to the total program and correct any difficulties 
through feedback mechanisms and appropriate management techniques. 

A self-renewing system of elementary education is projected in 
each participating elementary school, i.e., one which is less dependent 
on external sources for direction and is more responsive to the needs 
of the children attending each particular school. In the IGE schools. 
Center-developed and other curriculum products compatible with the 
Center's Instructional programing model will lead to higher morale 
and job satisfaction among educational personnel. Each developmental 
product makes its unique contribution to IGE as it is implemented in 
the schools. The various research components add to the knowledge of 
Center practitioners, developers, and theorists. 
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ABSTRACT 



The investigation reported in this thesis is on assessment in 
mathematics education. Specifically, this study explored the feasi- 
bility of using a written test to predict seventh graders' mathe- 
matical problem solving achievement as assessed by an interview- 
coding procedure. 

A search revealed that most available mathematical problem 
solving assessment procedures are commercial tests. The tests do 
not offer any definitions and their items are usually simple appli- 
cations or algorithmic situations which do not satisfy the criteria 
established iu this thesis for a mathematical problem. 

The method for validly assessing subjects' mathematical prob- 
lem solving achievement used in this study was a thinking aloud 
procedure. Interviews yielded audio and video taped protocols, and 
a coding system permitted classification, analysis, and scoring of 
the subjects' performances. Because of the complexity of the inter- 
view and coding scheme, a written Instrument which hopefully had 
high concurrent validity was developed so that it could be used as a 
valid alternative to the Interview and coding procedure* 

Thirty-one seventh graders were asked to think aloud as they 
tried to solve six mathematical problems in individually taped inter- 
views. The subjects' protocols were coded and scored to provide what 

XV 



was assumed to be a valid assessment of their mathematical problem 
solving achievement. The 31 subjects also took two 20 item written 
tents which were scored by the number of correct responses. Three 
rankings were developed from the interview test and one ranking was 
developed from each written test. 

The correlation coefficients between the written and interview 
test scores did not reach the .71 level established for feasibility. 
One coefficient reached .68 and the tests shared high rank order 
agreement. The§e results suggested that a more reliable test might 
attain the .71 correlation. Clustering and multidimensional scaling 
verified the structure imposed by the total score ranks. 

Other findings indicated that present coding schemes can be 
applied reliably to describe subjects' problem solving behaviors 
and that the scoring system permits logical ranking of the subjects. 
However, serious questions were raised about the validity of the 
thinking al©ud procedure. Video taping the interviews was advan- 
tageous because it captured silent indicators of problem solving be- 
haviors and took less time to code. 
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Chapter VI 
DATA AND ANALYSES 

Introduction 

This chapter presents the data> observations, and analyses from 
each of the three principal parts of the study. Scores, rankings, and 
statistics for the written tests are presented first. This is followed 
by the data of the Interview test. The statistical analysis of the re- 
lationships of the ranks determined by the written tests and the IT and 
the results of exploratory statistical procedures conclude the chapter. 

The Written Test (WT) 

The purpose of the WT was to produce a ranking of the same subjects 
who were to be ranked by their mathematical problem solving achievement 
on the IT. The data and statistics for the WT and a subsequent WT2 are 
presented before feasibility factors are reported. The description of 
the development of the rankings from the written tests concludes this 
section. 

Subject Response Data 
Two classes totaling 63 seventh graders took the 20 item WT. The 

* 

descriptive statistics for the WT are presented separately in Table 6.1 
for the 32 subjects who had been rated below average in mathematics 
achievement (Group B) and the 31 students who had been rated average or 

85 
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above average (Group A) by their mathematics teachers ♦ 

Table 6.1. 

MEAN, STANDARD DEVIATION, AND RANGE FOR THE WT: 
GROUP A, GROUP B, AND COMBINED 

Number of Mean Standard Range 
Subjects Deviation (20 items) 



Group A 
Group 6 

Groups A and B 
Combined 



31 7 .4194 3.8796 2 to 14 

32 3.7500 2.7238 1 to 12 
63 5.5556 3.7963 1 to 14 



According to Table 6.1, the results on the WT are consistent with 
the teachers ratings. Group A, the 31 subjects who also took the IT, 
averaged 7.4 correct responses to almost double the 38 mean o£ the 
lower rated Group B. Group A attained a higher number of correct re- 
sponses as it ranged from a low of 2 to a high of 14 while Group B 
ranged from 1 to 12. Figure 6.1 illustrates the distribution of the 
number of correct responses for each group. 

As detailed in Figure 6.1, everyone got at least one correct an- 
swer on the WT and no subject got exactly 10 correct responses. In 
addition, no Group B subject got exactly 8 or 11 items correct and 
only one subject attained the high of 12 right answers while six sub- 
jects achieved one correct response. Group A had three subjects attain 
the low of two correct while two subjects attained the high of 14 cor- 
rect answers. The mode for Group A was nine as five subjects reached 
this score. Group B was blmodal as seven subjects answered two WT 
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NUMBER OF CORRECT RESPONSES ON THE WT 
Solid bar for Group B Hollow bar for Group A 

Figure 6.1 Distribution of the Numbers of Correct Responses on the WT 

Items correctly and seven other subjects gave three correct answers. 
Some of the differences between the numbers of correct responses be- 
tween Groups A and B could be attributed to the number of omitted Items: 
Group A subjects skipped an average of 2.7 Items on the WT while Group 
B subjects omitted A.l Items each. 

The low averages and che number of Items omitted by the WT sub- 
jects caused the investigator to question the representative mathe- 
matical ability of the seventh graders selected to participate in the 
study. In order to compare the subjects to other seventh graders, a 
second 20 item written test (WT2) was developed from the available 
pool of items. A random sampling procedure was followed with the re- 
striction that any item which appeared on the WT could not be used on 
the WT2. 



In May, 1974, 350 seventh graders Including the original 63 were 



given the WT2. The investigator administered the WT2 to the 63 sub- 
jects (School 1) and a teacher from Des Moines, Iowa, had all three . 
of his seventh grade classes (School 2) take it. One teacher from a 
middle school (School 3) in Madison, Wisconsin, gave the WT2 to all 
four of her classes and two teachers from another middle school 
(School A) in Madison, administered it to 128 students. The descrip- 
tive statistics for the entire group, for each school separately and 
for the original groups A and B are presented in Table 6.2. 

Table 6.2 

DESCRIPTIVE STATISTICS FOR THE WT2: 
BY SCHOOL, COMBINED, AND BY GROUPS A AND B 

Number of Mean Standard Range 
Sub,1ect8 Deviation (20 items) 



School 1 
(Group A and B) 


63 


6.1111 


3.8189 


0 


to 


16 


School 2 


66 


3.9848 


3.7107 


0 


to 


17 


School 3 


93 


4.3978 


3.4108 


0 


to 


13 


School 4 


128 


7.9688 


4.1544 


0 


to 


19 


Schools Combined 


350 


5.9343 


4.1682 


0 


to 


19 


Group A 


31 


8.1290 


3.7748 


2 


to 


16 


Group B 


32 


4.1562 


2.7133 


0 


to 


9 



The results of the WT2 indicate that the subjects in this study 
(School 1) compared favorably to the other seventh graders who took 
the WT2. Their average of 6.1 was about half way between the 4.4 
average of the Madison School from a low academic achievement area of 



89 

the city and the 8.0 average of the Madison school from a high 
achievement area. Group A performed only slightly better than the 
highest mean of any school and all three Madison schools attained 
a higher average than the Des Moines school's mean of 4,0. Accord- 
ing to the results on the WT2, it appeared that the subjects used 
in this study were not atypical seventh graders and that their low 
average on the WT was probably due to the general difficulty that 
students encountered with the test items. 

WT Length and Reliability 

The low averages achieved by the students did not affect the 
feasibility of the WT, but two factors, test length and reliability, 
were important. A test which took more than an hour to complete or 
which did not attain a reliability of .80 would not meet the expecta- 
tions of the investigator. Hoyt*s internal consistency measures of 
reliability for the WT and the WT2 are presented in Table 6.3. 

Across the entire sample of students, satisfactory reliabilities 
of .82 on the WT and .84 on the WT2 were reached. Group A on both 
tests and School 3 on the WT2 had measure sufficiently close to .80 
to be acceptable. Only Group B*s reliabilities of .73 on the WT and 
.68 on the WT2 did not attain the desired minimum. However, since 
this group did not participate in the IT and the overall reliability 
for School 1 was adequate, the feasibility of the written test was 
not jeopardized* 

Test length as measured by the time necessary for students to 
complete the test was a second factor by which feasibility was to be 
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Table 6.3 

HOYT'S RELIABILITY COEFFICIENT FOR THE WT AND THE WT2 



Reliability Standard Error 



WT 










Group A 


.7968 


1.7045 




Group B 


.7270 


1.3807 




Group A & B Combined 


.8179 


1.5789 


WT2 










School 1 (Group A & B) 


.8023 


1.6551 




School 2 


.8356 


1.4665 




School 3 


.7897 


1.5223 




School 4 


.8136 


1.7484 




Combined Schools 


.837A 


1.6380 




Group A 


.7737 


1.7504 




Group B 


.6774 


1.4921 



determined. The investigator recorded the completion times for 59 of 



the 63 subjects during the WT2, forgetting to note two times. The two 
other missing subjects had not completed the test during the available 
class time (37 minutes) and worked during the next period with a second 
observer who did not record their completion times. Subjects in School 
1 averaged 27 minutes to complete the WT2 with one student finishing in 
16 minutes and three requiring 37 minutes. The two missing subjects 
who needed extra time would not appreciably alter the observations 
which were made. The 27 minute average indicated that School 1 sub- 
jects could respond to the 20 items on the WT2 without being rushed. 
Since School 1 was close to the combined school average in achievement, 
it was assumed that the completion time averages of other schools would 
not vary greatly from the 27 minutes. Furthermore, the time average 
was sufficiently low so that a large deviation such as ten lainutes Cthe 
maximum recorded) would only produce an average of 37 minutes, a 
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completion tine which would be less than the one hour maximum for 
feasibility and which would permit the test to be administered to 
most students during a single class period of at least A5 minutes. 

Written Test Rankings 

The rank of a subject on the WT was to be based solely on the 
number of correct responses, and only those subjects (Group A) who 
participated in the IT were ranked. Since two written tests, the WT 
and the WT2, were administered, rankings were determined for each 
Instrument and are presented in Table 6. A. 

As can be seen in Table 6.4, the rankings developed from the WT 
and the WT2 are similar. The rankings agree perfectly on subjects 8 
(rank 6.5), 16 (rank 24), and 31 (rank 18.5), and agree closely on 
subjects 2 (WT rank 11 to WT2 rank 10.5), 10. (WT rank 4 to WT2 rank 
2.5), and 27 (WT rank 1.5 to WT2 rank 2.5). Subject 19 was tied with 
subject 27 (WT rank 1.5) on the WT and ranked 6.5 on the WT2. The 
largest discrepancy in the rankings was for subject 20 as he had a WT 
rank of 24 and a WT2 rank of 9. 

Since the two written tests were formed from the same item pool 
they should have been equivalent. However, the mean for Group A was 
slightly higher on the WT2 and the WT2 ranking of some subjects varied 
(from their WT ranking). The gamma statistic of Goodman and Kruskal 
was computed to check the degree of association and was found to be .55. 
This value Indicated that given two subjects with untied ranks on the 
written tests, the probability that the orJerlng of their ranks is the 
same exceeds the probability that their ranks will have a different 



Table 6.4 

RANKINGS -^F GROUP A BASED ON THE RESl'UTS OF THE m AND THE WT2 

Subject WT Number MT Rank** WT2 Number WT2 Rank** 
Number* Correct Correct 



1 


8 


1 A 
14 


1 0 
IZ 




2 


9 


11 


ill 


in 


3 


7 


Id 


0 

7 




4 


9 


11 






5 


9 


11 


7 

1 




6 


7 


16 


L 




7 


5 


21 






8 


12 


6.5 


Iz 




9 


6 


18 .5 




OA 


10 


13 


4 


1 A 
1^ 


0 ^ 


11 


5 


21 


3 




12 


3 


27 


6 


4l«!) 


13 


3 


27 


2 


30.5 


14 


3 


27 


7 


18.3 


15 


11 


8 


13 


4 


16 


4 


24 


5 


24 


17 


13 


4 


7 


to e 


18 


9 


11 


8 


15.5 


19 


14 


1.5 


12 


6.5 


20 


4 


24 


11 


9 


21 


9 


11 


9 


13 


22 


7 


16 


9 


13 


23 


2 


30 


5 


24 


24 


2 


30 


2 


30.5 


25 


12 


6.5 


16 


1 


26 


13 


4 


10 


10.5 


27 


14 


1.5 


14 


2.5 


28 


4 


24 


6 


21.5 


29 


5 


21 


4 


27 


30 


2 


30 


8 


15.5 


31 


6 


18.5 


7 


18.5 



* The subject number represents the order of his/her appearance 
In the Interviews. Subjects 1-16 were video taped and 
subjects 17-31 were audio taped. 

** In case of ties on number correct « the ranks were averaged. 



ordering. Despite the high ranking agreement, the investigator decided 
to compare both written test rankings to the IT ranking to see which 
test produced a stronger relationship. 
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The Interview Test (IT) 

Group A, the students designated as at least average achievers 

* 

In mathematics, also participated In an interview test (IT) wher? 
the thinking aloud procedure was followed. Their mathematical prob- 
lem solving protocols were coded , scored, and ranked. The relevant 
data resulting from these procedures is reported in this section. 

The Thinking Aloud Procedure 
The first question posed in Chapter IV concerned the effective- 
ness of the thinking aloud procedure and related coding scheme for 
capturing and classifying the mathematical problem solving behaviors 
of seventh graders. Data and observations resulting from the inter- 
views and coding were to provide empirical evidence for making judg- 
ments. 

During the interviews, the investigator observed four indicators 
which could determine the effectiveness of the thinking aloud procedure. 
The signs included subjects* remarks concerning their ability to think 
aloud, periods of silence, the use of retrospection, and subject ner- 
vousness. Table 6.5 summarizes the occurrences of these indicators 
separately for the video taped and the audio taped interviews. 

As seen In Table 6.5, two subjects from each taping made a direct 
comment about their ability to think aloud. For example, subject num- 
ber five worked calmly but quietly, and after reading the fifth prob- 
lem explained to the observer, **I*m gonna (sic) figure this out in my 
mind and tell you when I*m done— or else I can^t get it^*. Audio taped 
subject 20 commented that she had to spend half of her time "concentrating 
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Table 6.5 



INDICATORS OF THINKING ALOUD DIFFICULTIES 



During 
Video Taping 



During 
Audio Taping 



Number o£ Subjects Who Made Comments 
on Their Thinking Aloud Ability 



2 



2 



Number of Subjects Who Explained by 
Retrospection 



5 



4 



Number of Silent Pauses Which Occurred: 
of 30-60 seconds 
over 60 seconds 



20 
19 



25 
21 



Number of Subjects Who Were Judged to 
be Nervous 



7 



6 



on thinking out loud". However, three subjects who made comments were 
rated (rating explained below) "Very Good" at thinking aloud and only 
one of these three indicated any nervousness. 

Retrospection was indicated by the number of subjects who offered 
explanations after they had achieved an answer. Five video taped sub- 
jects used retrospection in a total of ten Instances with one subject 
resorting to retrospection on all five problem? which she solved. 
Four audio taped subjects accounted for eight Instances of retrospec- 
tion. 

Silent pauses were periods of time during which subjects produced 
no codable behavior while they were attempting to solve the problems. 
Pauses less than 30 seconds were often used by subjects for assimilat- 
ing information, organizing ideas, or silent recapitulation and were 
not considered indicators of thinking aloud difficulty. However* pauses 
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longer than 30 seconds usually occurred in the protocols of subjecte 
who generally had difficulties expressing their thoughts aloud. Since 
the number and duration of silent pauses seemed to be a strong indicator 
of the ease at which subjects could think aloud, all pauses over 30 
•econds were recorded and dichotomized; pauses less than one minute and 
those lasting longer than one minute. As indicated in Table 6.5, the 
silent pauses occurred more frequently during audio taping. Twenty-five 
short and 21 long pauses were noted as compared to the 20 short and 19 
long pauses which occurred during video taping. Six subjects made no 
pauses over 30 seconds and 13 used only one or two pauses of either 
length. At the other extreme were subjects 8 and 19, twin brothers who 
had much difficulty thinking aloud. Subject 8 paused eight separate 
times for a total of 570 seconds and his brother lapsed into silence 
13 separate times for a total of 1,020 seconds. One of subject 13»s 
silent intervals continued 270 seconds during which the observer used 
prodding questions four times without provoking a response which could 
be coded. 

The third category in Table 6.5 was a result of the subjects* un- 
spoken reactions to participating in the Interview. Four video taped 
subjects and three audio taped subjects produced clear indications of 
nervousness. The most frequent and obvious signs included tapping a 
pencil, scratching parts of the body, or frequent shifting of body 
positions. Three other subjects from each taping procedure exhibited 
less obvious nervous behaviors. A subtle Indicator of nervousness was 
the habit of subjects to read the problems rapidly or carelessly. 
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soraetiroes slurring or mispronouncing words. Four subjects exhibited 
a noticeable physical habit and six read ra4>idly. At the beginning of 
one interview, a subject orally indicated some nervousness when she 
expressed concern about her ability to solve the problems, and a week 
after the incervlews, another subject directly stated that she was 
nervous during the interviews. 

The number of silent pauses noted earlier seemed to be a strong 
indicator of a subject's ability to think aloud. Thus, a categorizing 
scheme was created in order to rate subjects and judge the effective- 
ness of the thinking aloud procedure. Th© categories were "Very Good" 



(2 or less pauses), "Good" (3 


or 4 pauses), "Fair" (.5 or 6 pauses). 


mid "Poor" 


(7 or more pauses). 


, The results of applying the rating 


lichetae Is 


summarized in Table 


6.6. 






Table 6,6 




THINKING ALOJJD RATING OF SUBJECTS 






Number of Video Number of Audio 


Rating 




Taped Subjects Taped Subjects 


Very Good 


(2 or less pauses) 


11 8 


Good 


(3 or 4 pauses) 


1 3 


Fair 


C5 or 6 pauses) 


2 3 


Poor 


(More than 6 pauses) 2 1 



As indicated in Table 6.6, the thinking aloud abilities of video 
taped and of audio taped subjects were comparable. Eleven video taped 
subjects rated "Very Good" while only eight audio taped subjects achieved 



97 

that rating, but audio taping had three "Good" verhalizers to only 
one "Good" for video taping. Each type of taping had four subjects 
who were rated either "Fair" or "Poor" at thinking aloud. Oyer both 
taping procedures, 23 of the 31 subjects were able to think aloud 
without much silent hesitation and eight subjects had difficulty 
verbalizing their thoughts consistently. 

The data and observations resulting from the problem solving 
interviews did not produce any clear indications of the effectiveness 
of the thinking aloud procedure. However, it was obvious that some 
seventh graders found it very difficult to think aloud, as evidenced 
by their silent pauses and retrospection. The implications of the 
subjects' inability to verbalize are discussed in Chapter VII, Data 
resulting from the application of the coding system to the subjects' 
Interview protocols is introduced next. 

The Coding Systems 

After all the interviews were conducted, the resulting taped pro- 
tocols were coded according to the i?fevlscd coding system found in 
Appendix G and were scored by Lucas' point system which was described 
in Chapter IV and is summarized in Appendix F, The solution and coding 
times data, coder reliability measures, and observations about the 
coding system are presented in this section. 

During the pilot study, the investigator used Lucas' coding system 
for the protocols and was fortunate enough to receive his assistance 
as a second coder. Using a direct ratio of the frequency of agreements 
to the total frequency of agreements and disagreements, an agreement 



measure was computed for the process-sequence coding (.72), the 
checklist (.67), and the scoring system on "Approach" (.93), "Plan" 
(.86), and "Result" (.86). The agreement measure for each area was 
acceptable and the sources of disagreement on the checklist and 
process-sequence codings were examined in order to improve the investi- 
gator's Interpretation and application of Lucas* system. 

The modifications of Lucas* system for this study necessitated 
additional agreement measures and three coders including the investi- 
gator (Coder 1) were used to establish them. Coder 2 was Norman Loomer, 
a mathematics instructor at Ripon College in Ripon, Wisconsin. He was 
also conducting a study which utilized Lucas' coding and scoring system, 
thus little additional training and few practice comparisons were nec- 
essary for him to apply the Investigator's system. In addition, 
Loomer made coding suggestions and helped in coder agreement decisions. 
Coder 3 was PMth Meyer, a mathematics education graduate student at the 
University of Wiscousin-MadisOn. She is also an experienced teacher who 
has taught mathematics at all levels from elementary school through 
college. After Meyer practiced using Lucas' system, the coded protocols 
were compared and recoded until close agreement was reached with the 
invest ig;ator. 

After the training and practice periods, the investigator randomly 
selected one video taped protocol and one audio taped protocol from each 
problem of the IT. These 12 protocols and four randomly selected pro- 
tocols from Loomer* 8 study formed the sample for establishing coder 
agreement. No protocols which had been used for practice were included 
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in the sample and the three coders all coded the same 16 protocols. 

Since good agreement had be^n established with Lucas during the 
pilot study, the new variables aod modifications which the investi- 
gator introduced were the central concern of the second interjudge 
agreement measure. However, in order to assist Loomer in establishing 
an intercoder agreement for his study, a large subset of behavioral 
variables which represented both Lucas' and this investigator's coding 
system was selected. The new variables Rr, DX, TR, and TS and key 
variables S, Mf, Me, Alg, DS, DA, and C represented processes. The 
variables Rs (restates the problem in his own words). An (reasoning 
by analogy) , Vs (varies the process) , and Vm (varies the problem) vere 
omitted because the behaviors appeared infrequently during the tapings. 
The variable R (reads the problem) was omitted because each subject was 
directed to read the problem aloud before he began to solve it and any 
later reading was coded as Rr (rereading). The N (not classifiable) 
was not considered an important process and was omitted. 

Lucas' five outcome variables and his punctuation marks were suffi- 
ciently well defined so that not much practice disagreement occurred 
on these variables. Furthermore, some disagreement on these variables 
could be tolerated without affecting the evaluation of a subject's 
achievement. Thus these variables were omitted from the agreement 
comparisons. 

The error variables "se" (structural error in process) and "ee" 
(executive error in process) were included because a new checklist 
category had been established for structural error, However, the 
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error correction variables were omitted because high agreement was 
noticed during practice comparisons. 

Eight variables from the checklist were included In the subsystem 
for determining coder agreement. Variables (misinterprets data), 
(misinterprets question), X^g (algebraic manipulations), and X^^ 
(arithmetic computational error) were Included to check the clarity of 
the new error categories. The variable for using an appropriate repre- 
sentative diagram (X^) was included because subjects in Looiner*s study 
used drawings frequently. The only other checklist variables of common 
interest to Loomer and this investigator were the performance measures 
involving scores: X^g (Approach), X^^ (Plan), and 7^2^ (Result). A 
fourth scoring measure, Xg^ (Total), was dependent upon the others and 
thus not included. The remaining checklist variables were omitted from 
the study because they did not depend heavily upon Individual judgment 
(i.e., rereads entire j^roblem) or they appeared too infrequently (i.e., 
recalls related formula) to get a meaningful and reliable agreement 
measure. 

After the 16 protocols were coded, comparisons were made between 
two coders at a time. The frequencies of agreement, of disagreement, 
and of positive observations were recorded. A positive observation 
was an instance in which either coder alone or both coders simultaneously 
identified the occurrence of the behavior. The frequency of agreement 
Included the number of protocols in which both coders agreed that the 
Uehavior did not occur. After the three frequencies were obtained, 
agreement measures were computed. 
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Two agreement measures were computed for each variable. A direct 
ratio of the frequency of agreements to the sum of the frequencies of 
agreements and disagreements produced a simple agreement measure based 
only on positive observations. However, coders can disagree consistently 
(Coder A regularly codes the behavior at least as many times as Coder B 
does) or inconsistently (Coder A codes the behavior more frequently 
than Coder B does for some subjects, but Coder B codes the behavior more 
often for other subjects) and the type of disagreement was important, 
especially for Looraer's study of heuristic training effects. Thus, 
Indices of reliability which included coder biases were also computed. 
Krisskal's gamma statistic (cf. Hays, 1963, p. 655) is reported for the 
dlchotomous variables M^, X^^ and X^^, and a product-moment correlation 
coefficient is reported for the remainder of the variables. Appendix J 
contains the frequencies and agreement measures for each pair of coders 
and Table 6.7 presents the averages computed from the three pairings. 

According to the agreement ratios in Table 6.7, Me (model by equa- 
tion or relation) produced the lowest value of .61 and the remainder of 
the variables were agreed upon by the coders at least 70 percent of the 
time* Slnee Me was not a new or important variable, the value of .61 
was accepted. Furthermore, the reliability index Indicated that the 
disagreements on Me formed a highly consistent pattern and that coder 
bias was not a critical factor. 

For the three variables, S (.separates and summarizes the data), 
DX (deduction through exploratory work), se (structural error), and 
So ^^'^^ ^^^^^ reliability Indices of .A8, .56, .58, 
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Table 6.7 

AGREEMENT MEASURE AVERAGES OVER CODERS 1, 2, AND 3 



Variable 


Index of 
Reli- 
ability _ 


Agree- 
ments 


Dis- 
agree- 
ments 


Positive 
Observa- 
tions 


Agree- 
ment 
Ratio 


Rr (Rereading) 


^ .91 


23.3 


7.0 




. // 


S (Sep. Data) 


.48 


13.7 


3.0 


5.3 


.82 


DS (Deduction/Syn.) 


.94 


26.3 


8.0 


28.0 


.77 


DX (Deduction?Exp.) 


.56 


14.7 


1.7 


4.3 


.90 


DA (Deduction/Anal.) .85 


19.0 


7.7 


15.7 


.71 


TS (Syst. Trials) 


.92 


16.7 


1.3 


6.0 


.93 


TR (Rand. Trials) 


.73 


14.7 


0.7 


3.3 


.95 


Me (Model) 


.88 


33.7 


21.7 


47.7 


.61 


ee (Exec. Error) 


.96 


18.0 


5.0 


16.3 


.78 


se (Struc. Error) 


.58 


13.0 


3.7 


7.3 


.78 


Mj. (Diagram) d 


1.00 


16.7 


0.0 


3.3 


1.00 


Alg (Algorithm) 


.88 


44.0 


18.0 


60.0 


.71 


C (Check) 


.81 


16.3 


5.7 


12.0 


.74 


(Rep.Dla./Yes) d 


1.00 


16.0 


0,0 


5.0 


1.00 


Xj^^ (Albeg./ee 


.83 


18.7 


2.3 


8.0 


.84 


Xj^^ (Arlth./ee) 


.82 


16.7 


0.3 


4.3 


ao 
• 9o 


X^Q (Data/se) 


.34 


14.0 


2.0 


3.7 


.88 


Xgj (Question/se) 


.79 


15.3 


0.7 


1.7 


.96 


Xo£ (Flan Score) d 


.96 


14.0 


2.0 


16.0 


.88 


Xgy (App. Score) 


.67 


11.3 


4.7 


16.0 


.71 


X2g (Res. Score) 


.92 


13.0 


3.0 


16.0 


.81 



d a Dlchotomous variable 
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and .3/» respectively, high agreements were computed. This incon- 
sistency was a result of the distribution of the disagreements, the 
small number of positive observations, and the high ratio of the 
number of disagreements to the number of positive observations. For 
example, XgQ (misinterprets data) was found only four time in com- 
paring the coding of Coders 1 and 2. The low reliability index 
resulted because Coder 1 identified the behavior when Coder 2 did 
not note it in one instance and, in two instances. Coder 1 failed to 
identify the behavior when Coder 2 had noted it. A high agreement 
ration (.84) resulted because the coders agreed once when the be- 
havior did occur and they agreed that .the behavior was not present 
in 12 observations, this producing a ratio of 13 agreements to 16 
(13 agreements and 3 disagreements) positive observations. 

Since their agreement ratios were uniformly high, the low in- 
dices of reliability for S, DX, se, and were considered spurious. 
However, each variable was examined further to check its effect in 
this study. The S variable was not influential in determining a 
subject's ranking. 

For DX, Coders 1 and 2 had an agreement ratio of .88 and a re- 
liability oi- ,68. Since these two coders were the implementers of 
the coding scheme, the consistency was judged adequate. The dis- 
agreements on DX were negated when coders used DS accompanied by se 
to Indicate that the subject was combining the data indiscriminately 
or that the subject misinterpreted the question. Both codings re- 
sulted in a lower score for the subject *8 attack (Plan) or for his 

o 
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understanding (Approach) of the problem, and a subject who exhibited 
the errant behaviors usually attained a pgor score for his answer 
(Result). The acceptable agreement ratios and reliability Indices 
for yL^s* ^27* ^28 ^"PPO'^'®*^ judgement that the disagreements on 
DX did not seriously affect the subject's rankings. 

Structural errors (se) were an important factor in applying 
Lucas* scoring system and a reliability index of .58 appeared low. 
An inspection of the sources of disagreements discounted possible IT 
ranking inconsistencies. The investigator, Coder 1, share reli- 
abilities of .71 and .66 with Coders 2 and 3 respectively. These 
values indicated that the structural errors were applied with accept- 
able consistency by the investigator. Inconsistency arose when coders 
used DS accompanied by se instead of DX. Other disagreements occurred 
when a coder classified an error as ee instead of se. Uncorrected 
errors of either type or poorly planned process irregardless of the 
label also resulted in a lower subject score and ranking. Thus, the 
Inconsistencies of se labeling did not adversely affect the scoring 
and ranking system. 

The variable X^q was dependent upon the identification of se, 
thus its effect upon the scoring and ranking system was also dis- 
counted. The type of disagreements which accounted for the low 
reliability index of se were chiefly responsible for the low index 

of XgQ. 

After agreement ratios and reliability measures were computed, 
examined, and evaluated, the coded protocols and scotes were used to 



105 



search for ranking schemes. The IT ranking procedures are described 
next. 

The IT Ranking Schemes 
The second major question posed in Chapter IV was, "Is it pos- 
sible to assess, separate, and rank seventh graders according ^.o their 
coded mathematical problem solving protocols?". Lucas* scoring system 
was used for assessing problem solving achievement and determining 
rankings . 

After the application of Lucas* scoring system, four measures were 
available for each problem: Approach (0 or 1), Plan (0, 1, or 2), 
Result (0, 1, or 2), and Problem Total CO-5). (Appendix K) The first 
ranking scheme (Ranking A) was developed by summing problem totals for 
each subject across the six problems and assigning the rank of 1 to the 
highest sum. Tied ranks were averaged. The sums represented the com- 
bined evaluation of a student's understanding of the problem, the 
quality of his plans, and the accuracy of his results. The totals and 
ranks for A are presented In Table 6.8* 

According to ranking A, subject 15 had the highest total (24 
points) and was ranked first, while subjects 24 and 29 scored no points 
and shared the average of ranks 30 and 31. Other ties occurred at 
scores of 18, 10, 9, 8, 5, 4, and 3 points. Five subjects were tied 
at 9 to share rank 14 (average of 12-16) and five other students were 
tied at 8 to share rank 19 (average of 17-21). Except for three 
subjects tied at 18 points, the remaining ties occurred in pairs. 



106 



Table 6.8 

INTERVIEW TEST SCORES AND RANKINGS A, B, AND C 



3ject 


Approach 
Sub- 

A, ^ ^ 


Sub- 
Total 


Sub- 
iotax 


Inter- 
view 
Test 
Score 


Ivalllv— 

Ing 

A 
A 


Ing 


£\Gllll\> 

Ing 

p 


1 


3 






1h 


Q 
O 


V 


Q 


2 


2 


3 


4 


9 


1 A ^* 
l«f*' 


ly .5* 


1 0 

ly .D"^ 


0% 

3 


2 


3 


3 


o 
8 


19* 


01 

Zl 


01 

Zl 


4 


5 


7 


6 


18 


5* 


A 

4«5* 


4.5* 


5 


0% 

3 


4 


1 


8 


19* 


15 


1 o 
12 


6 


1 


1 


1 


3 


28.5* 


29 


oo 
29 


7 


2 


2 


1 


5 


24 • 5* 


OA CX 

24«5* 


o A ex 

24.5* 


8 


2 


3 


4 


9 


14* 


19. 5* 


1 o e^ 

19.5* 


9 


6 


7 


6 


19 


3 


Z 




10 


2 


2 


. 3 


7 


22 


oo 
22 


oo 
22 


11 


4 


3 


1 


8 


19* 


11 


1 ^ 
16 


12 


5 


3 


1 


9 


^ / lit 
14* 


7 


1 A 

14 


13 


2 


2 


2 


0 


23 


oo 
2J 


oo 
23 


14 


2 


1 


1 


4 


26.5* 


26 


27 


15 


6 


10 


8 


24 


1 


1 


1 


16 


1 


2 


1 


4 


26.5 


28 


26 


17 


3 


4 


3 


10 


10.5* 


13.5* 


10.5* 


18 


2 


2 


1 


5 


24.5 


24.5* 


24^5* 


19 


4 


7 


7 


18 


5* 


8 


6 


20 


3 


3 


2 


8 


19* 


17 


18 
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Table 6.8 
(cont'd) 



Subject 


Approach 
Sub- 
Total 

1 


Plan 
Sub- 
Total 

P. 

i 


Result 
Sub- 
Total 

i 


Total 
Inter- 
view 
Test 
Score 


Rank- 
ing 
A 


Rank- 
ing 
B 


Rank- 
ing 
C 

• 


21 


3 


6 


4 


13 


9 


12 


8 


22 


3 


4 


3 


10 


10.5* 


13.5* 


10.5* 


2a 


3 


3 


3 


9 


14* 


16 


17 


24 


0 


0 


0 


0 


30.5* 


30.5* 


30.5* 


25 


4 


6 


7 


17 


7 


9 


7 


26 


5 


8 


7 


20 


2 


3 • 


2 


27 


5 


7 


6 


18 


5* 


4.5* 


4.5* 


28 


2 


1 


0 


3 


28,5* 


27 


28 


29 


0 


0 • 


0 


0 


30.5* 


30.5* 


30.5* 


30 


2 


4 


2 


8 


19* 


1$ 


13 


31 


4 


3 


2 


9 


14* 


10 


15 



* Ties- occurred 

Note: Subtotals were a subject's partial scores summed 
across the six interview problems. 

The large number of ties in Ranking A did not separate subjects 
well and was likely to produce a low association with written test 
ranks. Thus, two additional schemes (Rankings B and C) which better 
differentiated between subjects were developed. Seeing that sub- 
jects with tied scores earned their points in different phases of 
the problem solving process, the investigator attempted to categorize 
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subjects by their subtotals for Approach (A), Plan (P), and Result 
(R)5 A^ was equal to the sum of the Approach scores, for subject i 
across the six problems; was equal to the sum of the Plan scores; 
and which was equal to the sura of the Result scores. Thus, sub- 
ject j who achieved scores of (1, 1, 0), (1, 2, 2), (0, 0, 0), (1, 2, 
1), (1, 1, 1) and (1, 1, 2) for his Approach, Plan, and Results 
respectively, attained subscores of = 5, P^ = 7, aud R^ = 6. 

Ranking B was based on A^, P^, and R^, but gave priority to sub- 
jects who demonstrated an understanding of the most probleras. By this 
system, the highest A^ score was ranked first. In case of ties, the 
subject with the highest P^ scores received the next rank. If sub- 
jects were tied after comparing the A^'s and P^'s, then the R^*s were 
compared with the higher value receiving the next rank. If ties 
existed for all three scores, the ranks were averaged. 

Ranking C was similar to Ranking B, but it emphasized the sub- 
ject*s plans and processes. The P^ scores of subjects were the f irst^ ' 
determiner of ranks and the A^ and R^ scores were compared in that 
order if ties occurred. Table 6.8 presents the A^, P^, and R^ scores 
with the total scores, and Rankings A, B, and C. 

As can he seen in Table 6.8, Rankings A, B, and C agree on the 
ranks assigned to subjects 7, 10, 13, 15, 18, 24 and 29 and are 
similar in the other ranks. Since four pairs of subjects had identical 
subscoreE, Rankings B and C each produced four pairs of ties and any 
other ranking system based on ordering A^, P^, and R^ would have had 
similar results* The rank uf subject 11 varied the most as it was 19 
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on Ranking A and 11 on Ranking E. 

Lucas* scoring system made it possible to develop three rankings 
of the subjects and his measures wer<? also used in the exploratory 
ranking procedures of Part III. The association of Rankings A, B, 
and C to the written test rankings is reported after other data re- 
sulting from the interview and coding procedures is presented. 

Audio Versus Video Taping 

The incorporation of video taping into the study prompted ques- 
tlons about tape type differences in recorded information, in subjects* 
performances, and in coding time. Data and observations are presented 
to identify the differences between audio and video taping. 

The physical differences in audio and video taping are immediately 
« apparent. Instead of a single tape recorder which the observer can 
operate alone, video taping requires at least one camera, special 
lighting, and a technical assistant. More than one pre- focused camera 
or a single camera which can be regularly refocused is necessary to 
effectively capture a subject's actions and writing. Compared to 
audio taping, the array of equipment and technical assistance necessary 
for video taping is more costly to the Investigator and perhaps more 
distracting to the subject. 

The disadvantages of video taping were offset by the information 
which would not have been* captured on an audio tape. Interesting 
physical actions such as a subject's smile, frown, or grimace, and 
his nervous habits of scratching parts of his body or shifting his 
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postitlon were recorded. Unspoken prcblera solving procedures were 
the most important observations noted on video tape. For example, 
subjects reread the problem or parts of It silently, but clearly 
indicated their behavior by following the sentences with their eyes 
or pencil, by moving their lips, or by asking a question inanediately 
after staring at the problem. Ninety-five occurrences of these re- 
reading behaviors which would not have been recorded on audio tape 
were noted for the 16 video tape subjects. Furthermore, a comparison 
of the observer's notes to the coded protocols revealed that 49 silent 
rereadings were not recorded by the audio tape. 

Another problem solving strategy which was not readily discern- 
able on audio tape occurred whenever subjects drew or modified a 
diagram without orally indicating their exact actions. Problem 4 on 
• the IT was solved by five subjects through the sketch of a ladder, 
but the coder used the completed diagrams and the subjects* verbaliza- 
tions to speculate on the sequence of modifications during all five 
protocols. Routine computations were also subject to coder guessing 
if the student did not adequately verbalize his actions. For example, 
one subject performed seven written multiplications silently as she 
attempted to divide 100 by 8. 

The advantages of video tape for recording subject behaviors In 
interview situations were clear without any need for statistical com- 
parisons. However, the questions about possible performance dlfferencee 
[ due to video taping wete answered by significance tests. Th« total p*o- 

cei^s sequence scores and the total solution tittes of bubjects were used 

V * 

[ <d 
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as measures of performance differences. 

From the pilot study results, the investigator suspected that 
th'e presence of novel and distracting video taping equipment caused 
the subjects to behave 'differently than if they were audio taped. 
It was felt that video taped subjects spent less time solving the 
interview test problems and that the haste of the video taped sub- 



jects would result In lower scores. These suspicions were checked 
statistically when two hypotheses were tested: 

HI: The mean of video taped subjects* total Inter- 
view test scores equals the mean of audio taped 
subjects' total interview test scores. 
H2: The mean of video taped subjects* total solution 
times on the Interview test equals the mean of 
audio taped subjects' total solution times on 
the interview test. 
The Individual total scores are presented in Table 6.8 and the total 
solution times are presented in Appendix I. The analysis of variance 
statistics for hypotheses HI and H2 are reported In Tables 6.9 and 
6.10 respectively. 



Table 6.9 



ANALYSIS OF VARIANCE FOR TOTAL INTERVIEW TEST SCORES 



Source 



df 



MS 



F 



Treatment!^ 



1 



«24 



«006 



1.00 



e.trov 



29 



38.31 
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Ab Tahle 6.9 Indicates, the null hypothesis HI cannot be re- 
jeeted. The very low F ratio ef .006 was an indirect result of the 
close sitBilarlty of the video and audio taped subjects' scores. The 
video taped subjects averaged 9.7 points with a standard deviation 
of 5.8 while audio taped subjects achieved a mean of 9.9 with a 
standard deviation of 6.2. 

Table 6.10 

ANALYSIS OF VARIANCE FOR SUBJECTS* TOTAL SOLUTION 
TIMES ON THE INTERVIEW TEST 

Source , df* m F P< , 

Treatments 1 101.00 3.97 .10 

K or 27 25.44 

*Due to erasure of tape, two subjects' protocols could not 
be timed. 

As seen in Table 6.10, the significance level of .05 was not 
reached and the null hypothesis H2 is not rejected. However, the F 
ratio of 3.97 was significant below the .10 level and the analysis 
suggested that there were some treatment differences. The video taped 
subjects' solution time mean of 16.7 minutes compared to the audio 
taped subjects' mean of 13.0 minutes made it apparent that video taped 
subjects took about the same amount of solution time as did the audio 
taped subjects. 

Lucas suggested that coding video taped protocols took less time 
than coding audio taped protocols. His observation was tested with 
hypothesis H3j 
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H3; The mean of the coding times for video taped 
suhjecta* protocols equals the mean of the 
coding times for audio taped subjects* proto<« 
cols. 

The coding time for each subjects* protocol Is presented In Appendix 
I and the analysis of variar.ce statistics is reported in Table 6.11. 

Table 6.11 
ANALYSIS OF VARIANCE FOR CODING TIMES 

Source df* MS_ ^F p< 

Treatments 1 .68 .002 1.00> 

» 

Error 27 292.09 

*Due to erasure of tape, two coding times could not he 
measured. 

As reported in Table 6.11, the extremely low F ration of .002 
did not reach the .10 significance level. Thus, the null hypothesis 
H3 is not rejected and it appears that audio tapes and video tapes 
require similar coding times. The sample «eans of 42.3 (VT) and 42.6 
(AT) and sample v<lrianc«s of 17.3 (VT) and 15.8 <AT) indicate that 
the coding time (Hstrlbutioiis were nearly identical. 

The difference. In the means of audio ta))&d ani of video taped 
subjects* solution tines prompted a further analysis of cDdlng times* 
Direct obselrvation of the data suggested that solution tiyes ve»e not 
commensurate with oedlng tima* Thus* solution time tditals and coding 
^iiat totals across subjects were found for audio taping and for video 
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taping* The ratios of coding time to solution time were computed 
for each tape type and the difference between the ratios was found. 
The results are presented in Table 6.12» 

Table 6.12 

CO^tPARISON OF CODING TIME R.\TIOS 

Total Solution Total Coding Titmg 
Time Time Solution Time 

Video Tape 

(15 subjects)* 251 minutes 635 minutes 2.53 
Audio Tape 

(14 subjects)* 182 minutes 597 minutes 3.28 



Savings: 3.28 - 2.53 « .75 minutes per one 

minute of tape 

* Due to erasure of tape, one coding time for each tape 
type could not be measured. 

As indicated in Table 6.12, the video taped protocols lasted 251 
minutes and took 635 minutes to code while 182 minutes of audio taped 
protocols took 597 minutes to code. Thus, one minute of audio tape 
took 3.28 minutes to code and one minute of video tape took only 2.53 
minutes to code. The .75 minutes difference represents a savings of 
approximately 22 percent of the audio coding time on a minute of tape. 

The data and observations resulting from the interviews and coding 
procedures were used to seek answers to principal and secondary ques- 
tions of the study. However, the central concern of the study depended 
upon the correlation of the rankings identified earlier in this chapter. 
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The correlations and exploratory statistics are reported next* 

Statistical Analyses of Rankings 

The feasibility of using a written instrument as a substitute 
for the complex interview and coding procedure depended upon the re- 
lationships resulting from the written test and the interview, tests. 
Two written tests, the WT and the WT2, were administered and three 
rankings, A, B, and C, were developed from the IT. The exploratory 
procedures which were used to seek additional rankings are explained 
after the initial statistics are reported. 

Relationsh ips of the Writ t en and Interview Tests 
Two comparisons were possible after the written and interview 
tests were scored and their rankings were developed. A product-moment 
correlation coefficient r^^ was computed between the raw scores (number 
correct) on the written tests and the Interview test total and sub- 
total scores used for developing each ranking. Thus, the correlations 
involving Ranking A were based on the total IT scores while correlations 
Involving Ranking B used the IT subtotals for Approach and correlations 
Involving Ranking C used the subtotals for Plan. For each correlation 

coefficient, a hypothesis that the population statistic P equals zero 

' xy 

was tested hy a t test with N~2 degrees of freedom. 

In addition to the correlation between scores, the relationship 
between the rankings developed from the tests was also measured. 
Kendalls tau (Hays, 1963) with ties was computed for the association 
between the rankings and the significance level of tau was found by 
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computing z values. Because of ties within rankings, Kruskal's gamma 
statistic was computed to provide a simpler interpretation of Kendall's 
tau. The correlations and rankings statistics for the pairs and 
Ranking A, WT and Ranking B, VfT and Ranking C, WT2 and Ranking A, 
WT2 and Ranking B, WT2 and Ranking C, and (WT + WT2) and Ranking A 
are presented in Table 6.13. 

Table 6.13 

CORRELATION AND RANKING STATISTICS FOR THE 
INTERVIEW TEST AND THE WRITTEN TESTS 

r tau p(tau) gamma 



WT & Ranking A 


.61* 


.44 


.001 


.48 


WT & Ranking B 


.AO** 


.33 


.007 


.34 


WT & Ranking C 


.59* 


.39 


.002 


.41 


WT2 & Ranking A 


.64* 


.49 


.001 


.52 


WT2 & Ranking 6 


.48** 


.38 


.002 


.40 


WT2 & Ranking C 


.61* 


.45 


.001 


.46 


(WT + WT2 & Ranking A) 


.68* 


.50 


.001 


.52 



* Significant at the .001 level in two tailed t test of 
** Significant at the .05 level In two tailed t test of 

As reported in Table 6.13, npne of the correlation coefficients 
betweer <-he seven pairs of written and interview test scores attained 
the desired minimum of .71 although the cotnibined scores of the WT and 
the WT2 produced an encouraging correlation coefficient of .68 with 
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the total IT score. The Plan subscore used for Ranking B produced 
the lowest correlations} the correlations the WT score and the WT2 
score were .40 and .48 respectively. Two pairs of scores, WT & 
Ranking A and WT2 & Ranking C, each resulted in a correlation of 
.61. Statistically, all seven correlation coefficients resulted In 
t test values which were significant at the .05 level. Thus, the 
hypothesis that no correlation exists between written and interview 
test scores was rejected. 

The associations between the rankings reported in Table 6.13 
resulted in values which appeared to be low but which were statisti- 
cally significant. Kendall's tau values ranged from a low of .33 
for WT & Ranking B to a high of .50 for (WT + WT2) & Ranking A. 
However, the probabilities for all seven tau values were below .01 
and. four probabilities fell below the .001 chance level. KrusKal*s 
gamma statistic ranged from .34 for WT & Ranking B to .52 for two 
pairs of rankings, WT & Ranking A and (WT + WT2) & Ranking A. The 
.gamma values Indicated that If two subjects had untied rankings, 
the probability was favorable that their ranks would have the same 
ordering on the written and on the Interview tests. 

Exploratory Procedure s 
As Indicated in Chaptet IV, exploratory statistical analyses, 
namely latent partitioning and clustering, were to be used to search 
lor underlying patterns among subjects and to possibly produce other 
ranking schemes. Because the computer program for latent partitioning 
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was not available, another pattern seeking program called multi- 
dimensional scaling was substituted. A similarity measure D (Figure 
6.2) based on subscores for Approach, Plan, and Result was computed 
for each pair of subjects and was used in both analyses* The matrix 

D ■» Distance Measure Z^j « Aj Normalized 

Aj « Total Approach Score of Subject J 7 « P1 Normalized 

Pj « Total Plan Score of Subject j Pj 

Rj • Total Result score of Subject j Z^^ « Rj Normalized 

D(S1, Sj) . (Z^^ - Z^/ + (Zpj- Zp^)2 + (Z^^ - Zj^^)2 

Notes: 1. D (Si, SJ) « 0 

2. D (Si, Si) » 0 

3. D (Si, Sj) - D CSj, Si) 

Figure 6.2. Similarity Measure Formula 

of resulting values was organized by incorporating the multidimensional 

scaling data and is presented in Appendix L. 

Guttman-Llngoes multidimensional scaling program ( Lingoes, 1973) 

searches for underlying patterns or structures among the similarity 

measures. The program then represents the structure in a spatial 

model by assigning coordinates to the objects (subjects) and computes 

stress values to measure the agreement between the order of the spatial 

distances and the order of the similarity measures. Higher agreement 

is indicated by low stress values. A second jneasure, the coefficient 

of allentation, deals with the type of monotoniclty criterion for the 

relationship between distance and similarity measures. The coordinates, 

stress values, and coefficients of allentation for one, two, three, 

and four dimensions were produced by the Guttman-Llngoes program. 
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Coordinates and accpinpanying values for two through four dimensions 
are listed' in Appendix M. The one dimension results closely paralleled 
earlier rankings and are discussed here. Table 6. 14 presents the one 
dimension scaling coordinates in an order which permitted a ranking 
to be imposed* 

As can be seen in Table 6.14, the multidimensional scaling program 
assigned subject 15 one extreme coord^<.v:ate of - 100.000 and assigned 
subject 29 a coordinate of 100.000. The parallel to Ranking A was 
immediately obvious and by assigning Rank 1 to subject 15, Rank i to 
subject 26, and continuing until rank 31 was assigned to subject 29, 
a ranking very similar to Ranking A was obtained. Kendall's tau of 
.96 and Kruskal's gamma statistics of .99 verified that the agreement 
between the two rankings was almost perfect and that little informa- 
tion was lost by basing Ranking A on total scores. Conversely, not 
much information was gained by using the subscores, Kruskal*s stress 
measure of .11557 Indicated that there was fairly strong agreement 
between the rank orders of the spatial distances and of the similarity 
measures. A perfect coefficient of alientation (.00000) resulted from 
weak monotonicity (distance from coordinate 1 to coordinate j « 
distance from coordinate k to coordinate A whenever the similarity 
of subjects 1 and j SL the similarity of subjects k and £) requirements. 

Johnson's (1967) max clustering algorithm was the second explore*- 
tory procedure used to group subjects according to some structure 
underlying the similarity measures. The program defines a sequence of 
partitions of a set of objects and uses the similarity values to 



120 

Table 6.14 

ONE DIMENSIONAL SCALING COORDINATES 
AND A RESULTING RANKING 

Kruskal-Gutttnan-Llngoes-Roskam Smallest Space Coordinates 
for M=»l (Weak Monotonlclty) 



Variable 
(Subject) 


Coordinate 


Rank 


Variable 
(Subject) 


Coordinate 


Rank 


15 


r -X- .11.. i .L'i . 

-100.000 


1 


11 


28.709 


17 


26 


-66.349 


2 


20 


81.710 


18 


9 


-62.808 


3 


5 


32.155 


19 


27 


-51.554 


4 


3 


36.062 


20 


• 19 


-51.124 


5 


30 


37.456 


21 


4 


-50.628 


6 


10 


42.423 


22 


25 


-43.609 


7 


13 


48.544 


23 


1 


-19.661 


8 


7,18 


55.265 


24.5 


21 


- 8.553 


9 


14 


61.291 


26 


12 


5.851 


10 


16 


65.620 


27 


17,22 


17.527 * 


11.5 


28 


68.045 


28 


31 


• 

19.234 


13 


6 


71.542 


29 


23 


23.108 


14 


24 


98.692* 


30 


2,8 


25.569 


15.5 


29 


100.000 


31 



*ErrorJ Subjects 24 and 29 had Identical subscores. There- 
fore, they should, both have coordinates of 100.000 and 
ranks of 30.5. 

Kru8kal*8 stress « .11557 In 6 iterations 
Guttman-Llngoes* coefficient of alienation « .00000 



121 



determine "diameters" of the subset* The max procedure attempts to 
construct hierarchical partitions which contain subsets of minimum 
diameter and assigns a partition rank to each pair of objects* 
Goodman and Kruskal*s (1954) gamma is computed to measure the agree** 
ment between the rank order of object pairs obtained from the parti- 
tion hierarchy and the rank order of the pair*s similarity value* 
Figure 6*3 presents the iterative steps of the clustering algorithm 
and illustrates the partitions of subjects who were homogeneous in 
some way* Appendix N contains the gamma values which correspond to 
each iteration* 

As seen in Figure 6*3, the clustering algorithm started with 
each subject as a distinct group and at each iterative step, joined 
two groups which were most similar* Thus, iteration 1 joined sub- 
jects 24 and 29, iteration 2 joined subjects 4 and 27, and iteration 3 
joined subjects 17 and 22* The iterations continued through itera- 
tion 30 which produced one group composed of all 31 individuals* 
Of particular interest is the partition formed by iterations 28 and 
29. At this level, the entire group of subjects is divided into two 
disjoint subsets*. The subset under Iteration 28 contains subjects 1, 
21, 4, 27, 26, 9, 19, 25 and 15 while the subset under iteration 29 
contains the remaining subjects* Further observation of Figure 6*3 
indicates that iteration 29 is partitioned into the disjoint subsets 
of Iterations 27 and 25* The subset of iteration 27 contains subjects 
2, 8, 3, 10, 13, 5, 30, 17, 22, 23, 20, 11, 31, and 12 while the 
subset of iteration 25 has subjects 6, 16, 7, 18, 14, 28, 24, and 29 
as its members* 



1.22 



I 
I 



(0 

14 W 

a> CO 



0 



(0 

(A 
U 
•H 



1 1 

{1 



in 
eg 



00 



o 

cn 



o 

eg 



H 

eg 



so 



00 



VO 



6Z 
82 

n 

81 

9T 
9 

ZT 
T€ 
XT 
6Z 
iZ 
ZZ 
L\ 
0€ 

CT 
OT 
€ 
8 

Z 

ST 
SZ 
61 
6 

9Z 

LZ 
<7 

TZ 
X 



(A 



o 



c 
•5. 

O CO 



I 



§• 

o 



«H O 
00 

c: CO 

M 00 

(0 |4 

a A4 



vD 



0^ 

I 



123 

Inspection of the three subsets of iteration 29, 28, 27, and 
25 revealed an identifiable pattern which was strongly related to 
the ranking scheme developed from one dimensional scaling. The 
subsets of iterations 28 and 29 corresponded to the first nine 
subjects (15 through 21) and the last twenty-two subjects (17 through 
29) as ranked in Table 6.14. Further observation of the table indi- 
cated that partitions 28, 27, and 25 divided the subjects into three 
disjoint groups which corresponded to the first nine (15-21), the 
next fourteen (12-13), and the final eight (7-29) respectively ranked 
subjects. 

Iteration 28 can be traced backward through the sequential 
separations of subject 15 (rank 1) and subjects 1 and 21 (ranks 8 
and 9) before the clustering loses consistency with the scaling seria- 
tion. Uhen subjects 19 (rank 5) and 25 (rank 7) are separated from 
the remaining six subjects (26, 9, 27, 19, A, and 25 respectively), 
the clustering configuration skips subject 4 which has rank 6. 

Dimensions two, three, and four of the scaling procedure were 
difficult to interpret and were inconsistent with the clustering 
results. For example, in two dimensions, the exploratory procedures 
displayed agreement on the horizontal axis (vector 1) as the scaling 
resembled the seriation of one dimension* However, the vertical 
dimension (vector 2) produced a wide separation between subjects 31 
and 11, the students who were paired at Iteration 11 in the cluster- 
ing algorithm. Since subject 31 had subscores (4» 3, 2) and subject 
11 had similar subscores (4, 3, 1), and no other evidence could 



account for the discrepancy, no further relationships or Interpreta*" 
tions were sought beyond one dimension* 

The results of the exp^loratory analyses were considered encourag- 
ing for future problem solving research* The similarity measure D 
was different from the measure used to produce Ranking A; however, 
the underlying structure found by multidimensional scaling was 
similar to the ranking structure imposed by total scores* Further-* 
more, the clustering procedure reaffirmed the results of the scaling 
procedure by producing partitions which were highly consistent with 
the one dimensional ranking scheme* 

Summary of Chapter VI 

The written tests were completed without time being a factor 
aiid the students did not have difficulties following the test for** 
mat. However, the reliability measures of the written tests were 
not sufficiently high for a correlation of *71 between tests to be 
obtained* Though the written and interview tests failed to attain 
the minimum correlation coefficient established as a feasibility 
criterion, the #68 correlation of the HT-VIT2 combined scor^ with 
the IT total score and the high agreement between written and 
interview test ranks were encouraging* 

The revised coding scheme and Lucas* scoring system were ap«« 
plied to the protocols with good Intercoder agreement and three 
logical ranking schemes were developed from the results* The IT 
scores did not produce the desired correlation coefficient with the 
written test scores* 
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Complications arose during the interviews. Nervousness which 
could be attributed to the experimental setting was not unexpected, 
but the inability of subjects to think aloud raised questions about 
the validity and reliability of the thinking aloud procedure. 
Video taped protocols held two advantages over audio taped records: 
they recorded important silent problem solving behaviors and they 
took about 22 percent less time to code. 

The conclusions and implications which were made from the data 
are discussed in Chapter VII. 



Chapter VII 
CONCLUSION 



Introduction 

After giving a summary of the study, this chapter presents a 
discussion of the limitations and conclusions. The Implications for 
problem solving evaluation and recommendations for future research 
conclude the chapter* 

Sunmary 

The main purpose of this study was to explore the feasibility 
of using a written test to assess and rank seventh graders mathe- 
matical problem solving achievement. The feasibility of the written 
test was to be judged on its physical dimensions, Its statistical 
characteristics, and its agreement with the results of the complex 
thinking aloud procedure. 

Thirty-one subjects were asked to think aloud during mathematical 
problem solving Interviews which were taped. The recorded protocols 
were coded and scored to provide a valid assessment of the subjects* 
achievement. Three rankings were developed from the scores and com- 
pared to the ranking determined by the number correct on a 20 item 
written test. The length, format, and reliability criteria of the 
written test were met, but the correlation coefficients between the 
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written and interviews test scores did not reach .71. However, one 
coefficient approached the expected value and the order of the rank- 
ings had high statistical agreement. 

The effectiveness of the thinking aloud procedure for capturing 
niathenjatical problem solving was evaluated and serious doubt was cast 
on its reliability and validity for use with seventh graders. A re- 
vised coding scheme described the problem solving behaviors well and 
was applied with high intercoder agreement, but the subjects* thinking 
aloud abilities and reactions suggested that the procedure was not 
(apturing their genuine mathematical problem solving tactics. 

Secondary questions about recording and coding procedures arose 
during a pilot study and were included In this Investigation. It was 
found that video taping was advantageous for recording subjects* 
unspoken behaviors and that less time was needed to code video tape 
than to code audio tape. 

Multidimensional scaling produced an IT subject ranking which 
agreed closely with the one developed from total scores. The cluster- 
ing procedure illustrated the grouping of subjects and reinforced the 
agreement between the other two rankings. 

tlmltations 

Though care was taken to exercise as much conttol and to permit 
as much generalization as possible, each part of this exploratory study 
contained factors which limited the Interpretations. The limitations 
and possible corrective measures are discussed here. 
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The meanings of "mathematical problem" and "mathematical problem 
solving" were similar in definition to Lucas • and were similar in 
spirit to Kilpatrick's. Yet, the definitions used in this study must 
be considered unique, thus limiting the generalizability of the results. 

The achool selected for this study was a parochial school, but 
the results of the WT2 on a larger population indicated that the sub- 
jects were fairly representative in achievement. However, precautions 
must be taken in generalizing beyond the school's population because 
the interview and statistical results were derived from a select sub- 
set of the school's seventh-graders. A random choice of students and 
schools in a larger population would have permitted a corresponding 
increase in generalizability. 

The latitude of the interpretation also depended upon the reli- 
ability and validity of the Instruments and procedures. Though most 
measures were acceptable, the arbitrary criterion levels and Incon- 
sistency of coder agreement measures could make coder reliability 
suspect. A larger number of coders and observations would establish 
more stable agreement measures. 

The results of the thinking aloud procedure were assumed to be 
valid representations of a subject's problem solving achievement. 
However, obiiervatlons made during the Interviews indicated that the 
subjects bad difficulties thinking aloud in addition to the usual re- 
actions to an experimental setting. The combination of these observa- 
tions raided serious questions about the thinking aloud procedure and 
only further research can determine the effects of the observed behaviors. 
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The exploratory clustering and raultldltnenslonal scaling procedures 
were subject to personal Interpretations, so the results of the analyses 
must be treated accordingly. When the procedures and Interpretations 
are defined more clearly, the reliability of the resulting Information 
and conclusions will Increase. 

Conclusions 

This section discusses the conclusions of the study with references 
to the main and secondary questions which were to be answered. The 
data and observations presented In Chapter VI were used to make the 
judgments and decisions discussed below. 

The physical and statistical qualities of the written tests, the 
WT and the MT2, Indicated that the Instruments were suitable for ad- 
ministering to seventh gradei^s In the classroor^« Groups A and B in 
School 1 averaged less than 27 minutes for completion times on the 
WT2 and it was assumed that no great deviation would occur with other 
forms of a written test or with other groups of seventh graders. 
According to the results on the written tests, the directions were 
clear and easy to follow although the items were difficult to answer* 
The students filled in the proper spaces with their answers and did not 
hesitate to omit ttm$ which they did not understand or could not solve* 
The average reliahllity of hoth. written tests across all groups was en 
acceptable .79. The small solution time average indicated that a longer 
written test could he administered In an hour without making the test 
a speed test* Assuming progress at the same rate* a 25 item written 
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test should take about 3A minutes to solve and, according to the 
general Speanaan-Brown formula (of. Ebel, 1972, p. 413), It should 
have a reliability of .83. 

The feasibility of the written test was chiefly determined by 
its ability to predict seventh-graders* problem solving achievement 
scores and ranks as measured By the IT. The product-moment correla- 
tion coefficient was .61 for the IT and WT scores and .64 for the IT 
and WT2 scores. Though both values were highly significant (p<.001) 
against Hojfj^ « 0, neither written instrument attained the minimum 
correlation of ,71 which was necessary to account for at least 50% 
of the variance between written and interview test scores. The IT 
subscores produced similar results when correlated with the written 
tests. Thus, the written test must presently be declared not feasible 
for the purpose of predicting mathematics achievement as measured by 
the thinking aloud procedure and coding scheme. 

The second main question of the study was, "Is It possible to 
assess, separate, and rank seventh graders according to their prob- 
lem solving protocols?" The answer appears to be positive. A 
variation of Lucas* coding system was applied with a high degree of 
agreement (.83 across the variables, see Table 6.7) and reliability 
(.8Q), The variables S, DX, se, and produced low reliability 
measures, but the disagreements which caused the low values did not 
•erlously affect the IT scores. Rankings A, B, and C were logically 
derived from the scores awarded by Lucas* point system and provided 
high rank order agreement measures. The scaling and clustering 
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analyses verified that the order imposed by Ranking A was consistent 
with the similarities and patterns which were detected among the sub- 
jects* 

Probably the most important outcome of this study resulted as 
the answer to the first question was sought. The question was* 
"How well does the thinking aloud ^procedure and related coding scheme 
capture and classify the mathematical problem solving behaviors of 
seventh graders?" and the answer appears to be "not very well." 
As indicated in the previous paragraph, the coding scheme was 
applied with acceptable agreement and resulted in logical ranking 
schemes; however, the behaviors of the students during the thinking 
aloud interviews raised critical questions about the reliability 
and validity of the information recorded in the protocols. The 
seven subjects (Table 6.6) who displayed obvious nervous habits 
were not likely to have performed as tiormally as those who were not 
nervous. Seven out of 31 is already a high ratio and if half of the 
subjects who gave subtle nervous indicators were indeed nervous, then 
almost- one-third of the subjects were not performing normally. The 
eight subjects who were rated either "Fair" or "Poor" at thinking 
aloud add to the suspicion that the procedure did not adequately cap- 
ture th# problem solving behaviors of some subjects and that it may 
not be a highly valid or reliable method to use with seventh graders* 

The differences in audio and video taping have indicated a 
distinct advantage for the latter because of its ability to detect 
silent rereading indicators, diagrams and alterations, and written 
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coraputations. Future Investigators need to decide if the extra 
Information Is worth the additional expense of video taping. 

Subjects in the video taping situation did not react much 
differently than students who were audio taped. The occurrences of 
comments, retrospections, nervous subjects, and fair or poor 
verbal izers were approximately equal in each procedure. The audio 
taped subjects produced more silent pauses, but the video taped 
subjects took significantly more time (p<.10) to attempt the IT. 
The scores of each group were nearly Identical and produced no 
significant difference. It appears that although video taping re- 
quires extra equipment which could be distracting, the subjects* 
behaviors, performance times, and achievement scores were not affected 
any differently than if the students had been audio taped. However, 
it must be remeinbered that both procedures may have altered the sub- 

# 

jects* behaviors and performances equally. 

Implications for Mathematical Problem Solving Assessment 

The main purpose of this study was to explore the feasibility of 
designing a written test to predict mathematical problem solving achieve- 
ment of seventh graders as measured by the Interview Test. The ex- 
ploration raised other questions which were Included In the study. 
^Possible answers are presented with the reconwendatioiw which resulted 
from the observations and data. 

The chief feasibility criterion for the written test was not met 
although the correlation coefficients were statistically significant. 
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Assuming the thinking aloud procedure produces a valid assessment 
of students problem solving achievement, a higher correlation is 
necessary hefpre the written test scores can be used as a sub- 
stitute or a predictor; however, the highly significant correlation 
coefficients and the extremely low probability of Kendall's tau 
values occurring by chance indicated that the written tests could be 
used to make scoring and ranking predictions with some confidence. 
For example, given that student A ranked above student B on a written 
teat, the chances are about A5% greater that student A ranked above 
student B on the IT than that student A ranked below student B on 
the IT. 

The sum of the MT and the Wr2 scores resulted in a correlation 
coefficient of .68 with the IT score. Since this value indicates 
that over 46 percent of the variance can be accounted for by know- 
ing one test score, it appears than an appropriately constructed 
written test with at least AO items might produce the .71 minimum 
correlation coefficient. The lengthened test would likely require 
more than one hour to complete and would probably need to be given 
in two parts to avoid student fatique, but It would remain quicker 
and easier for teachers to administer and score than are the complex 
thinking aloud and coding procedures. 

The critical observations of the thinking aloud procedure are 
not unique. Kilpatrick (1967) was aware of possible interference 
or interaction of speech and thinking when he had his eighth grade 
subjects think aloud, but he did not Indicate that any of his 
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subjects had difficulty verbalizing while they worked. Menchlnskaya 
(of ♦ SMSG, 1969) observed that ninth graders and adults with a 
secondary school mathematics education were able to think aloud easily 
and that external speech did not hinder them In solving a problem. 
Hov/ever, she found that first, fourth, and fifth graders had diffi- 
culty verbalizing as they solved arithmetic problems and they commented 
on the Interference it caused In their thinking. She felt that reason- 
ing processes changed and performance deteriorated when these students 
were required to think aloud. Pereira (1973) made similar observa- 
tions after he had 11~12 year old girls verbalize while trying to 
discover the rules of a mathematical structure. He found that sub- 
jects who worked In silence during a physical mathematical learning 
activity (pressing buttons on a machine) performed better and re- 
tained more than subjects who verbalized overtly while learning. The 
evidence from the above Investigations and from this study strongly 
suggests that the thinking aloud produce may not cause much inter- 
ference with adults and youths who have attained mental maturity, 
but that the interference of overt speech with thinking increases as 
the mental maturity of the subjects decreases. 

The exploratory analyses tried in this study have some potential 
for problem solving research. Clustering and multidimensional scaling 
produced graphic data which made groupings visibly apparent and 
detected structural patterns which were not apparent. In this study, 
the one dimensional scaling results and the clustered groups reinforced 
the structure Imposed by ranking A. Future analysis may relate other 



dimensions to patterns among the subjects* problem solving processes. 

The final implication is an outcome of the many plans, changes, 
observations, and facts which resulted during this investigation. 
Mathematical problem solving, being the complex behavior that it is, 
will not be easy to measure or assess with a single instrument. It 
appears that a written test may be feasible for predicting a sub- 
jects Interview test score and ranks, but that further investigation 
by the thinking aloud procedure may be necessary to evaluate indivi- 
dual processes and strategies, assuming that the subject is able to 
verbalize while thinking. In situations where it is applicable, 
the thinking aloud procedure sometimes provides an incomplete record. 
Lucas (1972) suggested that retrospection be used to procure addi- 
tional information about the missing behaviors although care would 
have to be taken not to give the subject any training or heuristic 
hints If such procedures were used. For the subjects who cannot 
verbalize well or who find that excessive interference occurs, 
some other procedure will have to be used to identify and record 
their mathematical problem solving processes. 

Recommendations for Future Research 

Like most exploratory studies, this investigation raised more 
questions than it answered. Future research could extend the efforts 
of this study or could Investigate the new issues which were raised. 
Suggestions are Included as the recommendations are discussed below. 

The written test scores did not achieve a .71 correlation co- 
efficient with the interview test scores, but the results were close 
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enough to recommend that additional efforts be made to reach the 
desired coefficient level* The initial step is to increase test 
reliabilities and there are five procedures which could be tried: 
1) Replicate the study with a large population* 
2X Use a longer form of the written test* A two part 
test with a total of 40 or more Items should be 
tried* 

3) Use more mathematical problems on the interview 
test* Since the seventh graders took approximately 
13 minutes to attempt the six IT problems, two or 
three more items could be included without tiring 
the subjects* 

4) Use a revised scoring system* Lucas' system resulted 
in numerous ties in subjects total scores and sub- 
scores* Scoring which attaches large weights to 
Approacht Plan, and Result would better differentiate 
among subjects and might improve the correlation 
between written and interview test scores* For 
example, a subject might be awarded 0-2 points for 
Approach, 0-3 points for Plan, and 0-2 points for 
Result * 

5) Screen the WT items and IT problems to remove 

those which have a poor correlation with test totals* 
The Interview test rankings developed in this investigation 
shared a strong rank order agreement with the written test rankings* 
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However $ if a higher level of confidence is desired » new rankings 
might be developed. Subjects* performance on individual XT items 
and item difficulty could be considered in the development of new 
ranking schemes. 

The thinking aloud procedure needs to be thoroughly examined 
before it is used for recording and assessing subjects* mathematical 
problem solving behaviors. Systematic application beginning with 
first graders and continuing through adults should detect general 
differences in ability to think aloud as the age or mental maturity 
of the subjects increases. A systematic approach might also uncover 
clues to explain why two subjects of the same age can vary greatly 
in their ability to verbalize. Future investigations must consider 
the effects of age level and Individual differences before deciding 
to use the thinking aloud procedures. 

The audio and video taping differences in recorded data were 
apparent. However ^ the differences in solution times and the differ- 
ences in coding time ratios were based upon seventh graders protocols 
which were short and which contained relatively simple behaviors. 
Loomer^s college students* solution times were much longer and the 
complex behaviors were more difficult to code. These observations 
raised suspicion that the differences In coding time ratios for the 
college students* protocols tnay not be consistent with the results 
of this study. Future studies might compare audio and video taping 
at different age levels to verify the solution and coding time differ- 
ences, 
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Finally, future research should further examine' the relationship 
of the multidimensional scaling and clustering procedures to mathe- 
matical ^>roblem solving assessment* In particular, the second and 
third «iimenslons of the scaling procedure need to be studied in order 
to see if prohlem solving behaviors, patterns, or factors can be 
related to them, 

f 

Comments 

A simple instrument is needed to give educators a preliminary 
assessment of students* mathematical problem solving achievement. 
The written instrument which was devised for the purpose did not 
achieve the desired correlation coefficients, but the results came 
sufficiently close to make the investigator confident that the goal 
can be reached. Further research should complete the development of 
the written test and search for ituproved methods of assessing stu- 
dents* mathematical problem solving achievement. 
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* * Appendix A 


KILPATRICK'S CODING FORM FOR PROBLEM-SOLVING PROTOCOLS 


Subject No* Coder 


Tape Readings 


Problem No* Date 


Time 


Score 


PREPARATION 


COMMENTS ABOUT SOLUTION 


Draws figure 


Questions existence of solution 


^^^^^ Changes condition (spec«/gen«) 




Performs exploratory tnanipulation 


Oiiestlons necpflfl*lf*v/Y*p1pvAnrp 

^^^^^ \^M^OV*W»»0 ilW«COOXUy/ 4b6XCVCIIIw6 




of inf ormat ion 


RECALL 






ExDresses uncertalntv About flnAl 


Recalls sane or related oroblem 


enl lit* *( nn 


Uses related problem In solution 


SaVS hp Hnpfin't* t^nnur fiou t*o oolvp 




problems 


Says he has forgotten procedure 




REQUESTS 


PRODUCTION 






^^^^^^^ requests assistance y more inrormation 


Uses successive aDnroxlniAf^lon 






RediiPfitfi VPlflf 'IrAt*'! nn 

^^^^^ CV^^^UCObO V CI* XX X^ObXUII 


Misinterorets oroblem 








..^^^^ Selects solution on Irrelevant basis 




EVALUATION 


uApLcoovso enj U/iuent f xxKxng xux 


problems 


Checks solution by subst. in equation 


Expresses distaste > dislike for 


, Checks that solution satisfies condition 


nifohlptYifi 

M l» VMXCIIIGI 




Admits confusion 


Checks solution by retracing steps 






Shows concern for performance 


Checks solution is reasonable/realistic 






Says procedure unorthodox 


Derives solution by another method 




Says he can*t explain result 




EXECUTIVE ERRORS Tallies Total 




uounc/ ancn toper* 




Alg« Manipulation 




Other slips ______ 


PROCESS SEQUENCE! 




m 1 
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Appendix A (cont'd) 
Process Symbols 

PREPARATION 

R a Reading and trying to understand problem 
PRODUCTION 

D " Deduction from condition 

E « Setting up equation 

T - Trial and error 
EVALUATION 

C » Checking solution 
OUTCOMES OF PRODUCTION (used in conjunction with D, E, and T) 

1 « Incomplete 

2 ~ Impasse 

3 » Intermediate result 
A « Incorrect result 

5 a correct result 
MODIFIERS 

Bar over symbol Structural error in process (used only with 

symbols for production) 

Underlined symbol « Difficuicy (hesitation, repetition) in process 
PUNCTUATION MARKS 

, Inserted between successive processes 
/ Work stopped without solution 
« Work stopped with solution 



Appendix B 
LUCAS' PROCESS-SEQUENCE CODES 
Process Symbols 

R « reads the problem 

S ~ separates/summarizes data 

Mf = Introduces model by means of a diagram 

• = modifies existing diagram 
Mf = Introduces diagram with coordinate system imposed 
DS « deduction by synthesis 
DA « deduction by analysis 

T ~ trial and errort successive approximation 
An = reasoning by analogy 

Me « model Introduced by means of equation, expression 
or other relationship 

Alg « algorithmic process 

N = not classifiable 

C checks the result 

Vs = varies the process ( condenses /outlines i tries 
different method) 

Vm = varies the problem (by analogy i by changing 
conditions) 

Outcomes of DS, DA_j T Processes 

1 ~ abandons process 

2 » impasse 

3 incorrect final result 
k a correct final result 

5 «= intermediate result (correct or incorrect) 
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Punctuation Marks 

- (dash) hesitation of approximately 2 units (30 seconds) 
0 scope of OS, DA, or T process 
, Inserted between successive processes ' 
/ stops without solution 
• stops with solution (correct or Incorrect) 

iSrrors 

"^p over process symbol = structural error In process 

4* over process symbol = executive error In process 

J*l (asterisk over error symbol) = previous error of 
type Indicated was corrected 
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Appendix C 1^ ^^^^ 



I NTERVIEW TEST ITEM PO OL WITH ANSWERS 

1. A farmer has a total of 39 chickens and cows In his 
t)arn. If you counted all the legs of these animals, 
you would get ICQ legs. How many chickens does he 
have? 

(28) 

2. The average weight of Billy, Willy, and Ted Is 125 
pounds, Billy weighs 110 pounds and Willy weighs 
120 pounds. How much does Ted weigh? 

(145 pounds) 

3. Mr, Director had trouble arranging his band. When he 
put 2 people In each row, there was one person 'extra. 
When he put 3 people In a row, there were two extra. 
With ^ people In a row, there were three extra. Finally, 
he put 5 people In a row, but then there were four extra 
members. How many people could there have been In his 
band? 

(any answer of the form 59 + 60n, n=D, 1,2,,,,) 

4. If you could buy oranges at a price of k for 25 cnets 
and sell them at 3 for 25 cents, how many oranges would 
you have to buy and sell In order to make a profit of 
one dollar? 

5« One hundred students were divided Into three groups, 
Group A had as many people as Group B and Group C had 
together. Group B had six more students than Groux) C 
had. Kow many students were In Group c? 

(22) 

6. A frustrated frog fell to the bottom of a thirty foot 
deep well. Every day he managed to climb up four feet 
but every night he slipped back three feet. How many 
days did It take the frog to reach the top of the well? 

(2?) 

7* A ship leaves New York for London at noon each day, 
and each day at noon a ship starts from London to New 
York. The trip across the ocean takes exactly three 
days. If you left on a ship from New York at noon on 
Monday, how many ships from London would you see by 
the end of your trip on Thursday noon? 

(7) 
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8, Mr. Carpenter makes only three-legged stools* and four- 
legged tables. He used 60 legs to make twice as many 
stools as tables. How many stools did. he make? 

(12) 

9, On Monday, John bought a raototblke for i?60. On Wed- 
nesday, he sold It to his friend Paul for $70, On 
Friday, John bought the bike back from Paul for !|f80. 
and sold It to his brother Craig for ^p90. How much 
money did John make or lose for all his work, or did 
he come out even? 

(Made $20.) 



10. In a television survey concerning two programs, 350 
people said that they enjoyed program X, ^00 said that 
they enjoyed program and 200 said they enjoyed both 
programs. What is the least number of people that 
could have been Interviewed In this survey? 

(550) 

11. On one television station, they show one minute of ads 
and then five minutes of the pi^ogram. At this rate, 
how many minutes of commercials do they show In three 
hours? 

(30) 



12. Midge was planning to join a hike to raise money for 
charity. Midge's mother promised to pay her ten cents 
for each mile she walked and her brother Jim promised 
to pf.y a certain amount for each mile too. If Midge 
marched 25 miles and collected a total of four dollars 
from her brother and mother together, how much did Jim 
pay her for each mile? 

(6 cents) 

13. Mr. Stout weighed 300 pounds, so he went on a diet. 
The first week he lost ten pounds, but then became 
careless and gained back five pounds the next week. 
The third week he lost ten pounds again, but the 
fourth week gained back five pounds. If he kept this 
strange diet, after how many weeks would he first 
weigh 250 pounds? 

(9) 

l^l*. Joe's sister ausan is nine years older than he is. 
In three years, Susan will be twice as old as Joe 
will be. How old is Joe now? 

(6 years) 
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15» An ostrich em weighs about 3 pounds. A hen*s egg 
weighs about 2 ounces. It would take 4oo hummingbird 
eggs to weigh as much as a hen*s egg. How many humming- 
bird eggs would It take to weigh as much as one ostrich 

egg? 

(9600) 

16. Jack has six coins. One third of his coins are dimes, 
but they are worth one fourth of the total value of the 
coins. What coins does Jack have? 

(2 dimes, 2 quarters, 2 nickels) 

17. Janet had 69 cents. Shelly asked her for change for 
a half dollar. Janet tried to make the change, but 
found that she didn't have have the right coins to do 
It. What coins did she have If each coin was less than 
a half dollar? 

(^ dimes, ^pennies, and 1 quarter) 

18. A dozen cookies and two loaves of bread costs ^i;i.20. 
Two dozen cookies and a loaf of bread costs .'i51.26. How 
much does one loaf of bread cost? 

(38 cents) 

19. Pete the Pirate burled i of his sack of gold coins 
and spent 1/3 of his sack of gold coins. Then he had 
300 coins left. How many gold coins did Pete have be- 
fore he burled or spent any? 

(1800) 

20. Two adult tickets and one child's ticket for a movie 
cost •1?6.25. Two adult tickets and three chilren's 
cost .■l>8. 75. What is the cost of one adult ticket? 

(;ii52.50) 

21. Suppose you could fill an old bucket with water in 

4o seconds. Then it springs a leak and all the water 
drains out in 120 seconds. How many seconds will it 
take you to refill the bucket now that it has the leak? 

(60) 

22. Mr. Ketchum wants to cut a 70 yard long piece of fish 
line into three parts. The second Piece should be 
twice as long as the first piece, and the third piece 
should be twice as long as the second piece. How many 
feet long should the third piece be? 

23. A candy producer puts a blue ticket good for one free 
bar in every 80th candy bar he produces and a rad ticket 
good for two gree bars in every 180th bar. Which candy 
bar was the first one with both a red and a blue ticket 
in it? 

(720th) 
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2^. The Restful Hotel receives Its glasses In full cartons 
of ^0 glasses each and the Towers Hotel gets Its glasses 
in full cartons of 24 glasses each, une time, they both 
ordered the same number of glasses and both got all full 
cartons to fill the order. What ls« the smallest number 
of glasses they could order for this to happen? 

(120) 

25. The P.T.A« raised $40. at a bake sale. Cakes were 
^U.50 each and pies were ii^l.OO each. Twice as many 
cakes as pies were sold. How many cakes were sold? 

(20) 

26. A six-pack of eight ounce bottles of pop costs 60 cents. 
At this rate, how much should an eight-pack of sixteen 
ounce bottles coat? (Don*t count the deoosit for bottles) 

(.iil.60) 

27. There once was a country where a chicken was worth 
1/10 as much as a pig and a r>lg was worth 1/10 as much 
as a cow. A farmer who owned 8 hens, 7 plg^, and 2 
cows decided to trade his pigs and cows in for hens. How 
many hens did he have after the trade? 

(278) 

28. It takes 96 square inches of paper to wrap without 
overlapping a uox shaped like a cube. How many cubic 
inches of space are inside the box? 

29. The Girl Scouts wanted to sell 600 boxes of jcookles. 
The number of boxes each troup had to sell depended 
on the number of members It has. ^ 

HOW many boxes of cookies should ^ ^ 

Troop 3 sell to do Its share? j^^^^J^ 3 ^^^^^^ 

' Troop 4 20 soouts 

30. The junior high school band marched in rows with the 
same number in each row and there were three marchers 
left over. When eight more marchers joined the band 
in marching with the same size rows as before, there 
were two. marchers left over. How many marchers were 
in each rovj? 

(9) 

31. On Tuesday, the phy ed teacher divided the class 
into eight teams to get the sane number on each team. 
On Thursday, three more students cane. Then he made 
seven teams in order for there to be an equal number 
of stiidenlJs on each team. How many students could 
have been in class on Tuesday? 

(any answer of the form 32 + 56n, n=0,i,2,..) 
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32, Mr. Shopper goes to the store once every two days and 
his neifjhbor Mr, Buyer goes to the sane store once 
every five days. On Friday, the two men meet at the 
store. On what day of the week will both men rneet at 
the store again? 

(Konday) 

33. There are 35 girls and 28 boys at the seventh grade field 
day. They join Into teams so that there are both boys 
and girls on each team. To keep the teams even, tL -jre 
has to be the same number of boys on each team and the 
same number of girls on each team. How many boy-glrl 
teams should there be so that everyone gets to be on a 
team? 

(7) 

3^1'. Sixty wooden cubes measuring one Inch on n side are glued 
together to form one big solid block. When the big block 
is painted, six of the little blocks don^t get any paint 
on them because they have blocks glued to all sides of 
them. How many inches long, wide, and high is the big 
block? 

(5" X 4" X 3") 

35* A large square has an area equal to the sum of the 

areas of the two smaller squares. To the nearest foot, 
what is the length of one side of the large square? 

I 1 (8 ft.) 



4- ft 



36. A fireman stood on the middle rung of a ladder, directing 
water into a burning building. As the smoke lessened, he 
stepped UP three rungs. A sudden flare-up forced him to 
go down five rungs. Later he climbed up seven rungs and 
worked there until the fire was out. Then he climbed 
the remaining six rungs to the top of the ladder and 
entered the building. How many rungs did the whole 
ladder have? 

(23) 

37. On a balance scale (like a teeter- totter) , a brick on one 
side balances evenly with one third of a brick and a one 
pound weight on the other side. What is the weight of 
one brick? 

m pounds) 

38. A barrel full of oil weighs 50 pounds. The same barrel 
filled with gasoline weighs 35 pounds. If oil is twice 
as heavy as gasoline* how much does the barrel weigh 

if it is einpty? 

(20 pounds) 
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39. 



^0, 



in. 



^2. 



^3. 



^5. 



The egR man sent a bill for 2^ dozen eggs, but the first 
and last digits were missing. If eggs cost less than one 
dollar a dozen, how much should the bill be? 



Bill for eggs 
24 dozen 



2.4 



Two Dlrates fownd a bag of gold coins and agreed to split 
It In the morning. After they went to bed, the first 
pirate got up and took one third of the coins. Later, 
the second pirate got up and took one half of the coins 
that were left. In the morning, there were still 200 
coins left. How many coins were there before either 
pirate sneaked any out? 

(600) 

ABCD is a square with B halfway between A & B and F half- 
way between D and C. If each side of the square Is ten 
Inches long, how many square Inches are In triangle DEG? 
A I f-ytB <12*) 



(112,48) 




The XJm-::^um Ice cream man has vanilla, chocolate, and 
strawberry Ice cream. He has marshmallow, fudge, coconut, 
and peanut toppings. If he uses two scoops of Ice cream 
and one kind of topping for each sundae, how .many 
different kinds of sundaes can he make? 

(24) 

Tom spent one dollar for his lunch. He spent 20 cents more 
for french fries than he did for pop, and he spent 15 
cents more for a haraberger than he did for the french fries. 



How much did the hamberger cost him? 



(50cents) 



Mr. Butcher mixes two pounds of fat with eight pounds of 
lean meat when making hamburger. The lean meat Is worth 
;+;1.20 a pound, but Mr. Butcher only charges :fJ1.10 a r>ound 
for the hamberger and he still makes ten cents t>roflt 
on each pound. How much Is each pound of fat worth? 

(20 cents) 

Mr. Hasty forgot his brief case when he left town. An 
hour later, his son junr»ed on a motor cycle to catch him, 
If Mr. Hasty driven 50 miles per hour and his son drives 
60 miles per hour^ how long will It take the son to 
catch ur> with him? 
(5 hours) 
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46. Car A gets twenty miles to a gallon and car B gets 
sixteen miles to a gallon, Both oars are taking a 
trip of the same distance and It is found that both 
cars used a whole number of gallons of gasoline. How 
many miles long could the trip have been? 

(Any answer of the form 80n, n«l,2,.3,. 

47. A new round rug was put on a square 
floor. The radius (distance from 
center to edge) of the rug was 10 
feet and the material covered about 
314 square feet of the floor. About 
how many square feet were not covered 
by the rug? 

(86) 

48. Hot dogs cost ten cents each and buns cost five cents 
each. How much should the art club sell a hot dog 
in a bun for if they want to make twenty dollars m ■ 
fit on five hundred sandwiches? 

(19 cents) 

49. A long freight train was moving 15 miles an hour on ^he 
tracks parallel to a highway. It took an auto 4 mir 
from the time it was even with the caboose to the ti 
it passed the engine. If the auto was going 30 miles -x 
hour, how long was the train? 

(1 mile) 

50. Wilma is running 6 yards a second and is 120 yards from 
the finish line. Dorla is 4o yards behind Wilma, How 
many yards a second will Dorla have to run to tie V/ilma'. 

(8) 
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WRITTEN TEST ITEMS' WITH AHiiWERS 



1. 



2. 



3. 



ABPJ is a square divided into equal smaller squares. 
Drav; a segment from point A to one of the 1 
other named points so that the area on one 
side of the segm-r'nt will be three times the 
area on the other side of the segment. 

(to D or to H) 





































r H i 


r 



c 
p 

r 



V/hen you buy stamps at the oost office, their edges are 
usually attached to each other. In how many different 
ways can three stamps be attached to each other? . 
(6) 



How many squares are there in the diagram 
at the right? Include those which overlap, 
(17) 



^» How many triangles are there in the 
diagram at the right? Include those 
which overlar), 
(13) 



5f If d, is the midpoint of AB and F is the /\ 
midpoint of DC, what fractional part of 
the rectangle ABCD is spotted? 
(3/8) p 

6, What number cones next in 1,2,^,7,11, ? 



7. A class of 30 students was divided into tv/o groutjs. 
One group had eight noro students than the other. How 
many students vrere in the larger group? 

(19) 

8, using pennies, nickels, dimes, or a combination of the 
coins, how many different ways could a person make change 
for a quarter? 

(12) 

9» The I5erineter (distance around) of a swimming r)ool in the 
shawe of a rectangle is l^B feet, if the length of the 
15001 in 50 feet, how many square feet of surface doesj the 
t)ool have? 
(1200) 
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17. 
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A mouse wants to get to Its house, but It has to go 
through two walls to get there. If the first wall has 
four holes and the second wall hr^.s three holes, how many 
different r>aths can It take to get to its house? 
(12) 

Triangle ABC has all sides equal. If the 
area of' the little triangle HGI is 5 
square Inches, what is the area of ABC? 
(D, ii), F, G. H, and I are all nidiDoints.) 
(80 square Inches) 




How 5T\any ounces are In 
one gallon? 



(128) 



1 cup 

2 cups 
2 Dints 



= 8 ounces 
= 1 pint 
= i quart 



k quarts = i gallon 



A race horse runs about 30 miles per hour, how many 
feet does It run in one minute? (5f260 feet in 1 mile.) 

{26kQ) 

Dr. Gurem charges ten dollars for the first visit and 
five dollars for each visit after tl;;at. If Mr. Ai lings' 
bill was one hundred dollars, how many visits did he make? 

(19) 

t 

A pen costs a dollar more than an eraser, •together they 
cost :iU.10. How much does the eraser cost? 
(5 cents) 

What whole number for "a" will make a»b + a»c = 5^ If ^ 
b is 3 and c is 4? • 
(8) 

ABCD Is a square with i2 halfway 
between A and B and F halfway 
betx^een D and 0. If each side of 
the square is ten Inches, what is 
the area of triangle DEF? 
. (25 square Inches) 

There was half of Horn's apple pie left. Then Wate ate 
one half of the half and Kate ate one half of what Nate 
left. What t5art of the pie was left after Kate ate? 
(1/8) 

Fran gave Jan half of her odokles and another cookie 
besides. Fran had seven cookies laft. How many cookies 
did Fran give to Jan? 
(9) 
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20, Mr, Baker's recipe for cookies needs h cur> of sugar and 
two eg#?s. He Is making a bl/j;ger batch of cookies, so he 
used 2^ cups of sugar# How many eggs should he use? 

(9) 

21, In one school, there are five girls to every four boys. If 
there are 1 hundred boys In school, how many girls are 
there In the school? 

(125) 

22, If ?6 cookies fill five boxes with six cookies left over, 
how many of the same sized boxes will 100 cookies fill? 
(7) 

23, It takes thirty chocolate chit) cookies to fill two thirds 
of a box. How many chocolate chip cookies would be 
needed to fill the whole box with them? 

(^5) 

2^, There vrere 18 brovm eyed students on the bus and 12 
students had brown hair. If there was a total of 26 
students on the bus, what is the smallest possible num- 
ber of students that 'had both brown eyes and brovm hair? 
W 

25, Jean has four different sweatshirts and five different 
pants. How many different outfits with one sweatshirt 
and one mir of mnts each could she make?' 

(20) 

26, The ^um-Yum ice cream man has vanilla, chocolate, and 
stravjrberry ice crea^n, Ho has marshmallow, fudge, '^eanut, 
and coconut toppings, how many different kinds of sundaes 
can he make if he only uses one kind of ice cream and 

one kind of toiDijing for each sundae? 
(12) 

27, One small country has ver?/ few carS' in it, so they use 
only a one digit number followed by one lelter of the 
alphabet for their license plates. How many different 
license plates can they make? 

(260) 

28, Two test car drivers departed from the car comiDany at 

the same time; but they drove away in oi3posite directions. 
The driver of car C averaged 60 miles per hour and the 
driver of car F averaged ^'0 miles per hour. How many 
hours was it before they were 600 miles amrt? 
(6) 

29* On a travel tour, the Tripp family drove eiaht hours the 
first day, five hours the second day, and seven houn^ t*.he 
third day, Their average speed was the same each day and 
they traveled a total of 1000 miles. How far did th6y 
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31. 



32. 



33. 



35. 



36. 



37 



38. 



travel the flecond day? 
(250 miles) 



30. Who 



is 



( iee) 



tho shortest player of the team? 

Players* help:hts 
Lee Is 5 feet. 
Jerry is 63 Inches. 
Wilt is 2 yards. 

Cazzie Is lyard, 2 feet, and 3 Inches. 
I.ou Is 3 feet and 30 inches. 



Four people are zoinf; to sit by a square table, one at 
each side. How many different seating; arrangements are 
oosslble? 
{2h) 

oandy has a red book, a blue, a yellow one, and a j^reen 
one. ;:ihe wants to olace them in an empty shelf of a 
bookcase. In how many different orders could she 
arranp;e the books? 
(2^1') 

Three pounds and 8 ounces of hamburger costs $2.80. 
How nuch does one pound of hamburp^er cost? 
(80 cents) 



3^1'. Triangle ABC has all sides equal. Point D 
is the midpoint of AB and & is the midpoint 
of BC. If the area of triangle ABC is ^•I'S 
square inches, what is the area of. figure 

ADiiC? 

(36 square inches) 




The perineter ( distance around) of a rectan/?;ular flovrer 
garden is 60 feet. There is a 2^ foot wide sidewalk 
around the garden. What is the perimeter of the outer 
edge of the sidewalk? 
(30 feet) 



The large cubfe was painted red oh all sides t/'^^i^ 
and then cut up Into 2? smaller cubes. How 
many of the smaller cubes have exactly two 
red sides? 
(12) 



Nancy spent two fifths, of her money for a sweatshirt. 
If the shirt cost four dollars, how many dollars did 
Ifancy have after she bought the shirt?' 
(6) 

Mixing four c;allons of alcohol with twelve gallons of 
water 'makes a ssolutlon which Is one fourth alcohol. If 
four more gallons of alcohol were added to the solution, 
then what fractional part Wdld be alcohol? 
{txto fifths) 
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39« Mr. Driver gets 20 miles to the gallon with his cotnmct 
car. If he drives three hours at 60 miles per hour, 
how many gallons of gas dees he tise? 
(9) 

^0. A bell is made of a special metal which has 3 parts 
of copper for each part of tin. How many pounds of 
tin are in a bell that weighs 3000 pounds? 
(750) 

^1, Roberto gets about two hits for every six times he gets 
to bat. How many times would he have to bat in order 
to get 150 hits? 
(456) 

i^2» Here is a line segment AB. A^^ J3 If you put 

two more points C and D on the segment so that no points 

are the same, how many segments will there be? 

(6) 

43. What is the greatest number of angles less than 180 
degrees that is possible when three lines cross at 
the same point? 
(12) 

The population of Boom Town has doubled every five years 
for the last 20 years. It had ^00 people in 1970. By 
what year will the population reach 12,800 if it 
continues growing at the same rate? 
(1995) 

^5» One cell divides into two cells every five seconds. If 
you started with 5 cells, how long would it take to have 
over 1,000 cells? 
(40 sec.) 

46. Two lines can cross at only one point, but three lines 
can cross at three points. What is the most points at 
which five lines can cross? 

(10) 

47. For every two dollars Jenny earned towards a new 
bicycle, her father gave her one dollar more. How 
much money would Jenny *s father end up giving her if 
she wanted to work until she had enough for a 
bicycle that costs :^;45? j 

(15) 

4r, This is a funny mirror* look what it does to the 
letter G. Draw in the image of the letter P, 



Image Q 
Mirror 



(m 
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49. 



50. 



51. 



52. 
53. 
5^. 
55. 

56. 
57. 



Two lines divide a plane Into four separate (non- 
overlapt>ii.f!;) areas. What is the largest number of 
separate areas that four lines can divide a Diane Into? 

ill) ---^a 




UslnK the edges of a cube as lines, how many pairs of 

parallel lines are there? 

(18) 

Lance, Larry, and Lena agreed to split the money 
they earned for doing errands. Lance earned !ii)1.75 
and Lena earned r??^.75t but after the split each person 
got ^p2.00. How much money did tarry earn before they 
divided up the money? 
a;1.50) 

If six bushels of wheat will plant four acres, ho.w 
many bushels of wheat are needed to plant 30 acres? 

If 2i^ chocolates fill 3A of a box, how many will it 
take to fill the whole box? (32) 

If 1*2=3, 1*3-4, 2*3«7, and 3*4=13, how much is 4*5? 
(21) 

The perimetor (distance around) of a square Is 4o inches, 
What^s Its area? 

(100 sq. m.) ♦ 

The perimeter (distance around) of a rectangle Is 30 
Inches. If the width is six inches, what is its area? 



(54 sq. in.) 

What Is the area of 
this figure? 
(175 sq. ft.) 



J/ 



Jil 



58. What is the. perimeter (distance around) of the 



rectangle ABCD? 
(48 ft.) 




'59, The formula for finding the area of a .circle Is A='frr^ 
where r Is the radius of the circle. How m.any times 
larger does the area of a circle become of you make 
Its'radiua twice as long? 
(4) 



* * 
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60 1 The mlnvite hand on a clock makes one complete turn 

(360 degrees) In one hour. How many degrees does the 

hour hand turn in one hour? 

(30) 

61. Rachel went to a sale where bicycles were selling for 
1/3 off the regular price. She paid f|^0. for a new 
3-speed bike. Kow much was the bike before the sale? 
(60.) 

62. Forty seventh graders were divided into two groups fso 
that the larger group had six more students than the 
smaller group. How many students were in the larger 
group? 

(23) 

63. T^^o numbers c and b have a sum of 90. If c is twice as 
large as b, what number is 0? 

(60) 

6^1-. The band director had the members march with three in 
each row, then with four in each row, and finally with 
five people in each rovj. In each case, there were no 
extra people left over. What is the smallest niimber 
p^^ip embers this band could have? 

65. If you mix eight pounds of meat worh one dollar a 
pound with two pounds of soybeans worth 25 cents a 
pound, how much a pound should you charge for the 
mixture? 

66. Alex v;alks to school. After walking 2/3 of the way, 
he still has 1/4 of a mile to go. How far Is his 
school from hone? 

(3/^^ rp.ile) 

67. On a map, three and one half inches represents 70 miles. 
Kow many iiiles does six inches represent on this mai^? 
(120) 

6S. If you painted all the sides of a certain sized oubo, 
you vrould paint 6OO square inches of surface. How long 
is one £Jide of the cube? 
(10 in.) 

6V;. Scrooge has n nickels and Jn dimes. How many cents is 
t^e total value of the dimes and the nickels together? 
(.35n) 

r / 

70. When Vincent answered 60 questions correctly on a test, 
he had 4/5 of the answers right. How many questions 
Mere on the test? 
(75) 
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71, Wausau is 150 miles from Madison, A truck traveling 
at ipO milos per hour leaves V/ausau towards Madison 
at the same time a car averaging 60 miles per hou.r 
leaves t^adison to V/ausau. How many T.iles will the 
trv.ck travel before it meets the car if they travel on 
the sane road? 

(60) 

72, lora sot scores of 63, 72, and 65 on her first three 
tests! What score must nho 5;et on her fourth test 
in order to end, up with an average of 70 for the four 
tests? 
(80) 

73, Mr. nacer drives two hours at 50 miles per hour and 
three hours at 60 miles per hour. V/hat is his average 
speod for the five hours? 
(56 raph) 

7^, A tree has a 2^ foot long shadow while a 12 inch ruler 
standing next to the tree has a four inch long shadow. 
How tall is the tree? 
(72 ft.) 

75. Gear A has a radius of six inches 
and sear 3 has a radius of two 
inches. If gear A makes 5 turns, 
how many turns will gear B make? 

. (15) 

76. l/hat whole number must m be in order for be in order 
for 1^,9 to be fractional number between 12 and 13? 

(8) ^ 

77. Polly Hiker takes five steps to walk over three squares 
of cement in the sidewalk. How nany squares could she 
cover If she took I50 steps? 

(90) 

73, A box holds threci pounds of mint candy. If vre made 
the box tvrice as long, twice as wide, and twice as 
deep, how many pounds of mint candy could it hold? 
(2^1') 

79* There are 25 students in third hour science class and 
35 students in fifth hour English class. When the two 
classes are put together, there are 52 students. How 
many students from the science class are also in the 
English class? 
(8) 

80. Two numbers m and n* have a sum of 80. If m is four 
times as large as n, what number is m? 
(6^1') 
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81. What is the largest number that can divide into both 80 
and l^^i- vilthout leaving any renalnders exceot zero? 
(16) 

82. Six girls belong to the basketball team but only five 
can play at a time. How many different groups of five 
players could be formed by the six girls? 

(6) 

83. Hore are four sections of chain. It costs I5 cents to 
cut a link open and 25 cents to weld a link shut. What 
Is the least It would cost to make a bracelet using all 
of these sections? 

(;i;1.20) cab Ct> Q> 

Sii-. Four chickens lay six eggs In two days. At this rate, 
how many eggs could eight chickens lay In four days? 
(24.) 

85. The number ab^ divided by 13 gives an answer of cd and 
a remainder of zero. What digit does d have to be for 
this to happen? cd (The letters a,b,o, and d all 
(8) 13irar rv^present digits.) 

86. Five students are running for class president and vice 
president. The one with most votes is president and the 
student with the second most votes is vice-president. 
How many different combinations of president and vice- 
president are possible? 

(20) 

87. Each of John's five marbles is a different color. He 
chooses two marbles to play a game. Kow many different 
pairs of marbles are possible to be chosen? 

(10) 

88. Vt'hen numbering the pages of a book, a printer uses the 
digits (0, 1, 2, — 9) together to form larger numbers 
like 9^ or 617. If a printer used 5I of the digits for 
a small book, how many oages did it have? 

(30) 

89. If 1 1 = 3, 1 2 = /+, 2 <^ 3 « 6, and 3 «■ it = 8, ^lThat 
does ^ 5 equal? 
(10) 

90. Squares ABiiF and BCDS are the / 
same size. (The perimeter (dis- 
tance around) of the spotted 
area is 50 ft, while the dis- 
tance from G to J (through K & I) 
is 15 ft. How many foet is the 
perimeter of the shaded area? 
(50) jr 
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91. 



92. 



93^ 



9^. 



95. 



96. 



97. 



98. 



99 < 



A man ovmed three connected 
squares of land and v;anted 
to divide it among his four 
children. Draw lines to 
shovr how he could divide up 
the land so each child gets 
an equal share, 
(nany solutions) 



Tvjenty-five narbles are in a sack. Eifiht 'narbles are 
"blue, ten are green, and the rest are red. If I take 
out two marbles v/ithout lookinp; and they are two 
different colors, v;hat tvro colors are they most likely 
to be? 

(blue and green) 



The fi/^ure ABCU has side AB 
parallel to side CD. VJhat 
is the area of the figure? 



(16 



sq, 



ft.) 




0 



Diane *s bus left Wausau at Is^O and arrived in Madison 
at 4tl5. How many minutes long was her bus ride? 
(155) 

Jin left Racine at 3 {20 and took one hour and fifty 
minutes to drive to Madison. What time did he arrive 
in Madison? 
(5:10) 

Juli<d painted the entire surface of a board three feet 
long, ten inches wide, and one inch thick. How many 
square inches of surface did she mint? 
(812) 

Towns A, 3, and G are all ten 
miles apart. Town D is half- 
way between A and 3 and is 
about eight and a half miles 
from G. If you lives in town 
D and vranted to visit all 
, three other towns, one day, 
vrhat is the smallest number of 
miles you would need to travel? 
(30) 




About two thirds of a fish can be eaten, the rest is 
waste. How many pounds of fish must I';r. Angler catch 
In order to have 12 pounds to eatt 
(18) 

Jeremy paid $10 for 100 hot dogs and 1)5 for 100 buns. 
If he wants to make five dollars profit v;hen selling 
sandwiches ♦ How much should he charge for each hot 
dog in a bun? 
(20J^) 
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The pottery Club sold 20 dozen cookies for *60 cents a 
dozen. If it cost 35 cents a dozen to make the cookies, 
how much total profit did they gain? 
(iji^.OO) 



According to the tax table, 
how much would you end up 
paying for a taxable Item 
priced at .^.>3.59? 
«^3.73) 

For a new blanket Oscar paid 
^>5»^9 including tax. v/hat 
was the price of the blanket 
before tax? 

In a class of thirty students, 20 
and 10 xTore sandals. If half of 
what 1« 
shoos? 
(5) 



Tax table 



Cost 

13-37J2^ 
38-62j^ 

63-87jz^ 
88.«99j^ 

For each 
dollar 



Tax 
0 

1^ 

3(z5 



ti the least possible number 



(For Items 101 & 102) 

students wore shoes 
the class Is boys, 
of boys wearing 



Jack gave half of his money to Jill, Then Jill gt. ^e 
half of the noney she got from Jack to Jane. After 
Jane spent ten cents of the noney fron Jill, she had 
a quarter left. Kow nuch tnoney did Jack have before 
he ;!;ave any away? 
(^1.^0) 

Candy bars cost ten cents each if you buy then separately 
or three for a quarter if you buy then In 'groups of three, 
xiow nuoh would you save on two dozen candy bars if you 
bought then In groups of three Instead of separately? 
(^O.il'O) 

The Kathematica Club has four comtnlttoos of two people 
each. K embers may belong to more than one committee, 
but no two committees have the same peoi5le working to- 
gether. What is the smallest number of i:eoi5le that 
could belong to the Kathenatlcs Club? 
(^) 



107. N is a number on the number line half way between 1/2 
and 3A''. What number is N? 

(5/3) 

108. Paul has 6o different baseball cards and Jim has 50 
different Woball cards. Twenty of Paul's cards have 
the same players that Jim has. Eow many different 
playerfl do Paul and Jim have togothnr? 

(90) 
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109. Hr. Grocer ban 5 i:ioimfUj and 2 oiincofj of aunflo^er seeds 
to put into 2 ouncG bagLi, How many ba.o;n of sunflovrer 

110. D la the dnclmal number half vray between .5 ^^nd .6. 
What nui^-ber iv, D? 
(.55) 

111. V/hich player has the best 
record when you consider 
both shots t\ttepi.iDted and 
shots nade? 
(Art) 



Name 


Attempted 


Made 




12 


7, 


Art 


" 6 




Luke 


10 


-i- 


Rod 


8 




Ell 


1^ 


. 7 



(For item 111.) 

112. Scott threw 60 passes and ccr.pleted 25 of ther. Ji^ Vms 
thrown only 36 passes, but has comrileted the same per- 
centap;e of then as Scott has, How many passes has Jin 
conii3leted? 

(15) 

113. Jos has oorrpleted 25 passes in 60 attemots while Jerry 
has conoleted 9 passes in 20 atte^nots, and Hudy has 
conpleted 11 passes In 25 attempts. Which -oasser has 
the best record? 

(Jerry) 

11^. Hcnt-a-car charges ;'(.7.00 a day plus ten cents a nile. 
If tir. iialesman's bill for 6 days was ',79. BO, how many 
miles did he travel? 
(378) 

115. One Tuesday, the temperature reached 25 degrees above 
zero at noon and dropped to 19 degrees below zero at 
night. The next dtiy, the temperature at noon was half 
vjay beti^een Tuesday's warmest and coldest readin/?;s, What 
vms the temperattire at noon on Wednesday? 

(3* above) 

116. Jess wei/5hs 175 pounds and Marsha v^eliQihs II3 pounds, 
If Weil's weight is half way between the tv;o weights, 
How 2iuch does he weigh? 

(l^l-^ lbs.) 

117. luoy had five yards of ribbon. Snoopy bit off sixteen 
inches of it, Peanuts took two feet of it, and Charlie 
took tvfo yards of it. How much ribbon did T.ucy have left? 

(1 8/9 yds., OR 1 yd. 2 ft. 8 in. OR 68 in.) 

118. Kaud earns :i.2.10 an hour. How much money does she earn 
in ton r.inutos? 

('iO.35) 
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119, Wes Gamed ijia.OO for working one hour and fifteen 
minutes, How much vrould he earn in an hour? 
(01.60) 

120, Material costs .;5l,80 a yard, Hov7 much would five feet 
and three inches of material cost? 

(..0.15) 

121, Scott cut a five yard and two foot pole into halves. 
How long was each piece? 

(8* ft.) 

122, I-iarsha works three hours and forty-five minutes on hnr 
part tine job after school each day. How many hours does 
she vrorli each school week of five days? 

(18 3 A) 

123, The area of a rectangle is IBO square inches. If its 
width is one foot, v;fat is the perimeter (distance around) 
of the roctanf^le? 

(5''i' inches) 

12^1-, The area of a rectangle is- 5 1/4 square feet. If its 
width is six inches, how nany feet is the lem^th of 
the rectangle? 
(10 1/2) 

125, A party nix needs 3 ounces of Rice Chox, four ounces 
of corn Chex, and five ounces of peanuts. If you 
wanted to make two pounds of mix, how v.any ounces of 
"iice Chex would you need? 

(e) 

126, A 6 .f^^allon bucket has a hole that leaks out onn quart 
of v/ater in a ninute. If a faucet can pour in one 
gallon in a rrinuts, how lonr; vrill it take to fill the 
bucket? quarts makes 1 gallon) 

(C rr.in.) 

127, Dsinf!; only nickels or quarters or a oorabinatlon of then, 
how :nany v:ays are there to make chan.q;e for a dollar? 

(5) 

12^, Torra ato five pancakes in twelve minutes, San ato 3 
pancakes In eight minutes, and Gail ate 4 oancakes in 
10 ninutes. Who ate the fastest? 

129, If a car is traveling at 40 miles per hour, how far will 
it travel in 75 minutes? 
(50 rniilcs) 



( A '.^^lend 1 x D , c on ' t . ) 
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130. 

131. 
132. 

133. 
l3^^. 

135. 
136. 

137. 

13S« 
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a' frelfsht train had stop!3ed on the tracks and Tony 
jo/5n;ed aloan; £3ldo of It fro'n thfi caboose to the engine 
in "five nimitos. If Tony jogs at 6 nilea per hour, 
hovT lonf5 In the train? 
(1/2 r^lle or 2,6^1-0 ft.) 

Archie runs fo\jr feet per second faster than 3ob. It 
ta'cea Archie 5 seconds to run the ^-i-O yard dash. How 
lonn; does it talce Bob to rim ^0 yards? 
(6 see.) 

Ths bakery put its fresh batch of cookies into 6 size ?•• 
boxes with ten cookies left over. The next batch ?jas 
twice as big, and fit evenly into 13 size I' boxes. 
How nany cookies vrere in each box? 
(20) 

Cindy borrowed ;V3000 to buy a car. ohe agreed to:. pay 
^100.00 a month if or 3 years to repay the loan ^:lus 
interest. How mxch interest did she pay in the 3 years? 
{$600.) 

Jan w.xt .15.50 in a bank where th'ey pay six cents 
interest for each dollar you leave in for one year. 

How ni>oh money would she have in the bank after one 
ynar? 

Jan put .-noney in a bank vrhere they pay six cents Interest 
for each dollar' you leave in for one year. A year later, 
her tr,oney plus the interest totaledt53»00. How nuoh had 
she put in the bank? 
f50.00) 

Tilos for floors come in different shapes. Which one 
of the shapes pictured h.sre could not cover (without* 
leaving spaces) a square floor? I 



One i^lane cuts space into two oarts and two ijlanes can 
cut space into at nost four parts, '//hat is the largest 
number of parts that three planes can out space into? 



firn. Cord has a 35 foot rope, a k'C/ foot rope, and a 56 
foot rope. Ho vjants to cut all three ropes into smaller 
nieces so that all the pieces are the same lengths He 
vrants these fequal pieces to be as long as possible with- 
out wasting any ropo. How long should each piece be? 
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142. 



143. 
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Boontown is 50 lilies fron Clinton and Clin lion Is 30 
miles frora Adams. What Is the closest possible dis- 
tance from Adams to Boontovm? 
(20 miles) 

V/hat in the smallest number that can be divided by 

8, 10, and 12 xflthout leaving any remainder exceot zero? 

(120) 

A group of boys are standing in the lunch line so that 
there are tvro boys in front of a boy, there are two boys 
behind a boy, and there is a boy betvreen two boys. What 
ifi the smallest possible number of boys in the lunch line? 
(3) 

Pour students are standing in the lunch line* Kow many 
different vrays could these four students be lines utd? 
(24) 



Here are shapes made un of six attached squares, 
shape could not be folded into the shane of a cubrjv 
(D) 




144. 



145. 



Hero is a figure made ur> of six squares. If 
are allowed to slide and turn, but not flio this flgur 
which figure below would np_t be possible to natch? 



B 





Each of the figures below has all sides and angles 

equal. Which figure could not be used as a tile on 

a floor (because they would leave spaces of you tried 
to fit the tiles together?) (C) 




6 
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146. Mr. Driver fills his gas tank whenever he gets down to 
one- fourth of a tank. During a trip, he started with 
a full tank filled uis twice along the way, and had half 
a tank left when he returned home. If his tank holds 
twenty gallons, how many gallons of gasoline did he use 
on the trip? 
(40) 
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lit-?, Bret iras the season's leading scorer with 63 points, 
but Jaff was only six points behind. During the next 
gane, Bret scored nine points and Jeff scored 17 points, 
in the next mme after that, each player scored 15 
Doints. H0V7 many i?oints has Jeff scored this season? 

(89) 

I'^i'S. Mrs. King has won three times as many tennis matches 
as she has lost; If she has played 120 matches, how 
many has she won? 
(90) 

l4o. Hrc Rlsgs has five wins for every two losses in his 
tennis matches. If he has won 150 matches, how many 
losses does he have? 
(60) 

150, V/illa spent 50 cents on 10^ pencils and 5/^ erasers. If 
she bought at least one pencil and one eraser, how many 
different combinations of pencils and erasers could 
she buy? 

(5) 

151, Fred has to put 175 marbles into sacks so there is the 
same number in each sack. If he can*t put all the mar- 
bles into one sack, what Is the smallest number of sacks 
he will need? (s) 

152. Janls started her trip with a full tank of gasoline. 
After driving 1 1/2 hours, she had 2/3 of a tank of 
gasoline left. How many hours can she drive on a 
whole tank of gasoline under similar conditions? 

1/2) 

153. John said that he paid about $2/+0,00 for his television 
set. If he had rounded off to the nearest ten dollars, 
what is the least he could have paid for his set? 

154, one taxi driver gets 35J^ for each dollar clocked on 
the taxi meter. He also gets tips. If he made a 
total of $2 5. 00 one day for clocking |60.00 on the 
meter, how much money in tips did he get? 

(iip^.OO) 

155. A person working in a restaurant gets paid by the hour 
Plus tips, If the tips average half of the hourly 
wage and the total of the two is :^Z*^0 an hour, how 
much an hour does the person get in tips? 

(:ii;o,80) 
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156. A three inch rubber band can be stretched to seven 
Inches without breaking It, If a five inch rubber band 
were made out of the sane batch of rubber, how long 
should It be possible to stretch it without breaklna? 
(11 2/3 m.) 

157. John buys pencils at three for ten cents and sells then 
at a nickel each. How much profit would he earn on a 
dozen nenclls? 

i5B. In figure ABCDiiFGHIJ, all the 
horizontal parts of the steps 
are equal lengths and all the 
vertical parts of the steps 
are equal lengths, v/hat Is 
the area of the figure? 
(72 sq. ft.) 

159. If apples cost five pounds for 99^ and there are about 
five apples to a pound, approximately how much would 
twenty aiDples cost (to the nearest cent)? 

160. Mr. Roofer charges :;.;200. to reshlngle a rectangular roof 
that Is i^O feet by 60 feet. His next Job is on a rec- 
tangular roof twice as long and twice as wide. How much 
should he charge for the bigger roof? 

(S800.) 

l6i» After ^rs. Merchant reduced a :;&5.00 shirt by a certain 
fraction of the price, the new price was 4.00. Later 
she reduced the f'A,00 price by the same fraction as be- 
fore. What is the price of the shirt after the second 
reduction? 
(-3.20) 

162. A box of candy was mssed around the class, iiach stu- 
dent in turn took one piece and passed the box on until 
all 100 pieces were gone. Joe got four pieces including 
the first piece and the last piece. How wany students 
were in the class? 

(33) 

163. In the last two months, gasoline has increased from 
thirty five cents a gallon to forty cents a gallon. If 
it keeps increasing at the sane rate, how many months will 
it be before gasoline will cost one dollar a (Jiallon? 

(24) 
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164, By slovrlng down from sixty miles an hour to fifty tnlles 
an hour, Don gets three more miles oer /j;allon of 
rcasollne with his car. He gets eighteen Tniles t>er gallon 
at the slower speed. How many gallons vrould he save on 

a 180 mile trip If he traveled at 50 instead of 60 miles 

oer hour? 

(2) 

165. The rent for an indoor ice rink is .:;.40. per hour. If 
25 people skate for ^5 minutes and share the cost 
eoually, how much will each have to pay? 

^?"l.20) 
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INSTRUCTIONS FOR THE INTERVIifiW TlSST 



The purpose of this Interview Is to obtain some Infor- 
mation on the viays In which people like you solve mathematical 
problems. This Is not a test and you don't have to worry 
about passing It or getting a grade on It. Try to do your 
best though. 

You will be asked to work on a small set of problems and 
to think aloud as you work on each problem. This means that 
you should say out loud all the things you are thinking 
while you try to solve the problem. I will record what you 
say so that I can remember how you solved the problem and 
so that I can talk to you. 

There are only four rules to follow while you work on 
the problems. 

1. Read each complete problem out loud before you 
start to work on It. Talk In your usual tone of 
voice and try to be clear enough for me to under- 
stand what you are saying. 

2. Write down anything that you want. There is (lore 
paper If the problem sheet Isn't enough. Don't 
erase anything: just draw a line through It If .vou 
decide not to use It. Keep talking even when you 
are writing. 

3. If you have tried hard to solve a problem and cai't 
get the answer, then just tell me and we can go to 
the next one. 

^. Tell me when you have flnlshedcne problem and are 
ready to start the next one. 

Some of your friends might be helping ne do this study, 
so please do not talk about the problems or the interview 
with them. It may only cause them to get confused and mix 
up the results of this study. Thank you for helping ne. 
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Maximum of 5 points broken Into 3 subscoresi 

1) Approach score 

A maximum of one point was awarded If It was clear that 
the subject understood the data, conditions, and objective of 
the problem. This was Indicated by the nuBlfloatlon or cor- 
rection of all structural errors. No points were awarded If 
confusion on any of the three parts of the problem prevented 
the subject from establishing a direction which could lead to 
a correct solution. 

2) Plan Score 

A maximum of two points was awarded when the subject 
had derived enough relationships to solve the problem or had 
produced a sequence of approximations which had focused on the 
correct solution. Structural errors had to be corrected or 
nullified. Executive errors were permitted if they did not 
obscure the solution path. 

One point was awarded if the rationale for a key step 
in the solution was lacking or an important relationship or 
step prevented the subject from achieving a completed solution. 
An lincorreoted structural error would also be a source of an 
Incomplete or unclear solution path. No points would be 
awarded for haphazard, unclear, or undirected procedures or 
plans . 
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A maximum of two points was awarded when the subject 
established a correct torn of the solutl*on» All structural 
or executive errors had to be corrected or nullified to score 
two points. 

One point was awarded for a correct numerical result 
but with Incorrect units, or If the result was a close 
approximation of the solution, or If the subject failed to 
provide all the required unknowns. 

i^) Total Score 

The total score for a single woblem was the sum of the 
approach, plan, and result scores. Thus, an Integral score 
ranging from 0 to 5 Inclusive was oosslble. 



Appendix 6 
PROCESS-SEQUENCE CODES 

Process Symbols 

R » reads the problem 
Rr ; « rereads the problem or parts of It 
Rs » restates the problem In his own wof*ds 
S » separates or summarizes data 

» Introduces model by means of a diagram 
M^, » litodlfles existing diagram 

Me « model Introduced by means of equation, expression or other 

relationship 
Alg « algorithmic process 

DX » exploratory work with data (direction not apparent) 
DS » deduction by synthesis (direction apparent) 
DA « deduction by analysis 

TR « random trial and error (no pattern apparent) 
TS « systematic trial and error (pattern apparent) 
An « reasoning by analogy 
N «> not classifiable 
C « checks the result 
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Outcomes of DX« DS,^ DA, TR, TS> N Processes 

1 B abandons process 

2 « Impasse 

3 • Incorrect final result 

4 « correct final result 

5 a Intermediate result 

183 
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Appendix G 

Punctuation Marks 

(dash) hestltatlon of approximately 15 seconds 
( ) scope of DX, DS, DA, TR, TS or N process 
, Inserted between successive processes 

stops with solution (correct or Incorrect) 
/ stops without solution 

Er rors 

se above process symbols « structural error In process 

ee above process symbols « executive error In process 

sec above process symbols » structural error corrected 

eec above process symbols * executive error corrected 
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SP SP 

O 0 

O 0 

H H 



0) 



I 



5 



I 

Si 



1 = 

g 



o 

4J 



•SI 
H ^ 

Cd cd 



!i 

o 
o 

Q) 
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:u 

4J 
U 
0) 
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PILOT STUDY WT RESULTS 

16 Itens 



Subject 

1 
2 
3 
A 
5 
6 
7 
8 



1 
I 



2 
I 
I 



I 
I 
I 

I 



3 A 5 
I 



I 

I I 



7 
I 
I 
I 

I 
I 
I 
I 



8 



9 
I 
I 

I 
I 
I 
I 
I 



Problem 
10 
I 
I 



11 



I 
I 
I 
I 
I 



12 
I 

I 
I 



I 
I 



13 



A "I** indicates that the subject got the correct solution. 
A blank indicates that the subject got an incorrect answer* 

Kuder-Richardson Formula 20: 

p^ • proportion of subjects who got item i correct 
" proportion of subjects Who got item i wrong 

2 

« variance of the total scores of the subjects 
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SOLUTION AND CODING TIMES OP SUBJECTS* PROTOCOLS 
Subject 

Number Solution Time (Minutes) Coding Time (Minutes) 



1 


22 


44 


2 


10 


20 


3 


* 


* 


4 


11 


58 


5 


14 


48 


6 


12 


34 


7 


16 


27 


8 


18 


25 


9 


28 


56 


10 


18 


46 


11 


25 


87 


12 


22 


56 


13 


20 


28 


14 


8 


28 


15 


18 


50 


16 


9 


28 


17 


10 


32 


18 


11 


38 


19 


* 


* 


20 


14 


82 


21 


10 


33 


22 


11 


36 


23 


17 


45 


24 


8 


24 


25 


14 


36 


26 


13 


54 


27 


17 


58 


28 


17 


55 


29 


8 


18 


30 


12 


36 


31 


20 


50 



*Due to technical problems, the time was not recorded. 
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AGREEMENT ON CODING AND SCORING VARIABLES 

coders 1 and 2 BEST COPY AVAILABLE 

t, i u, ^??®?.?f Frequency of Frequency of No. of Positive Agreement 
Variable Reliability Agreement Disagreement Observations Ratio 



Rr 


• 89 


24 


0 


90 


#73 


S 


• 40 


12 




I 


• 71 


DS 


• 94 


2S 


Q 




• 74 


DX 


• 68 


15 


2 

mi 


9 


• 88 


DA 


.81 


19 


1 n 


1 CI 


• 66 


TS 


• 97 


17 


4 


/ 


• 89 


TR 


• 59 


14 




A 


• 88 


Mia 


• 97 


32 




49 


• o2 


ee 


• 98 


20 

mm w 


4 


1 C 
10 


• 0 J 


se 


• 71 


13 


4 




• 7o 


f ^ ' 


1^00 


16 


V 




1^00 


Alg 


• 86 


42 


21 


DX 


• 07 


c 


• 86 


16 




Jr4 


• 73 


X.(d) 

0 


1^00 


16 


0 


e 


1 AA 

1 • UU 


^16 


• 99 


19 


1 


6 


• 95 


^17 


• 87 


15 


1 


6 


• 94 


^20 


• 30 


13 


3 


4 


• 81 


-^21 


• 68 


15 


1 


2 


• 94 


X26(d) 


• 88 


13 


3 


16 


• 81 


^27 


• 58 


11 


5 


16 


.69 


^28 


• 91 


12 


4 


16 


• 75 



d s dichotomous variable 
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Appendix J (Continued) 
AGREEMENT ON CODING AND SCORING VARIABLES 
Coders 2 and 3 



Index of Frequency of Frequency of No. of Positive 
Variable Reliability Agreement Disagreement Observations 



Agreement 
Ratio 



Rr 
S 

DS 
DX 
DA 
TS 
TR 
Me 
ee 
se 

M^(d) 

Alg 

C 

he 
hi 

^1 

hi 



hz 



.91 
.29 

.91 
.33 
.89 
.82 
.59 
.79 
.95 
.36 

1.00 
.84 
.76 

1.00 
.75 
.87 
.30 
.68 

1.00 
.72 
.89 



27. 
15 



23 

14 

19 

16 

14 

31 

18 

11 

16 

42 

17 

16 

16 

16 

14 

15 

15 

11 

12 



7 
1 

9 
2 
4 
0 
2 

21 
4 
6 
0 

17 
4 
0 
0 
0 
2 
1 
1 
S 
4 



23 
3 

26 
4 

11 
4 
3 

45 

15 
9 
2 

57 

11 

5 

3 

3 
4 
2 
16 
16 
16 



22 
TS 



.76 



15 « 

TS" .94 

23 

37 " .72 

14 ^ 

.88 

TS .83 

16 ^ 

K =1.00 

14 ^ 

VS " .88 

31 ^ 

.60 

18 . 

7? " .82 

11 

TT « .65 

16 = 

16 1.00 

42 _ 

17 

16 
TT 
19 
21 
16 

15* 
14 
Va 
15 
VS 
15 
Tl? 

Te .69 



" .71 
" .81 
"l.OO 
.86 
"l.OO 
" .88 
- .94 
' .94 



12 



.75 



d » dichotomous variable 
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Coders 
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AND SCORING VARIABLES 

1 and 3 BEST COPY AVAILABLE 



Variable 

Rr 

S 

DS 
DX 
DA 
TS 
TR 

Me 
ee 
se 

Mf (d) 

Alg 

C 

Xg(d) 
^16 



Index of Frequency of 
Reliability Agreement 



Frequency of No, of Posivite Agreement 
Disagreement Observations Ratio 



X 



X 



^17 
20 
21 



X26(d) 

hi 

^28 



.93 
.74 
.97 
.68 
.84 
.97 
1.00 

.89 
.94 
.66 

1.00 
.95 
• 81 

1.00 
.76 
.71 
.43 

1.00 

1.00 
.70 
.96 



24 
14 
31 
15 
19 
17 
16 

38 
17 
15 
18 
48 
16 
16 
18 
16 
15 
16 
14 
12 
15 



5 
3 
6 
1 
9 
2 
0 

24 
7 
1 
0 

16 

7 

0 

3 

0 

1 

0 

2 



23 
6 

29 
4 

17 
7 
3 

53 
18 
4 
5 
62 
13 
5 
8 
4 
3 
1 
16 
16 
16 



24 
I? 
14 
T7 
31 
17 
15 

19 

17 

16 
VS 
38 
<57 
17 
IS" 
15 

18 
TS" 
48 

16 
7S 
16 

18 
IT 
16 
TS* 
15 

16 

TS* 
14 

TBT 

12 
IT 
15 
TS" 



' .83 
' .82 
'' .84 
V .94 
' .69 
'' .89 

'l.OO 
' .61 
'' .71 
' .94 
'l.OO 
' .75 
' .70 
'l.OO 
' .86 
1.00 
' .94 
1.00 
' .88 
' .75 
' .94 
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SUBJECT SCORES ON THE INTERVIEW TEST 

Sub- 





PI 


P2 


'•» 


P4 


P5 


P6 




Total 




0 


1 


1 


1 


1 


1 


5 




1 


0 


1 


0 


2 


0 


2 


5 


14 




0 


0 


0 


2 


0 


2 

mm 


4 






0 


0 


1 


1 


0 


0 


2 




2 


0 


0 


1 


2 


0 


0 


3 


9 




0 

w 


0 


2 


2 

mm 


0 


0 


4 






1 


0 


0 


1 


0 


0 


2 




3 


1 


0 


0 


2 


0 


0 


3 


8 




2 


0 


0 


1 

lis 


0 


0 


3 






0 


1 


1 


1 


1 


1 


5 




4 


0 


2 


2 


2 


0 


1 


7 


18 




0 


2 

mm 


2 


1 

lis 


0 


1 


g 

w 






0 


0 


1 


1 


1 


0 


3 




5 


0 


0 


1 


2 


1 


0 


4 


8 




0 

V 


0 

V 


V 


1 


0 


u 


1 






0 


0 


0 


1 


0 


0 


1 




6 


0 


0 


0 


1 


0 


0 


1 


3 




0 

V 


0 

V 


0 


1 


0 


0 


1 

X 






0 


0 


0 


1 


1 


0 


2 




7 


0 


0 


0 


1 


1 


0 


2 


5 




V 


0 

V 


0 


1 

■la 


0 


0 


1 






0 


1 


1 


0 
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Append ix K( Continued ) 
SUBJECT SCORES ON THE INTERVIEW TEST 



Subject 
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Sub- 
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3 
2 



Total 
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18 



8 



197* 



Append ix K ( Continued ) 





SUBJECT 


SCORES 


ON 


THE 
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Appendix M 
MULTIDIMENSIONAI SCALING RSSUI.IS 



FOR 2, 3, AND 4 DIMENSIONS BEST COPY AVAILABLE 



hruskal-Guttmn-Iingoes-rioskam Smallest Space Coordinates 
for M = 2 and M - 3 (Weak mono tonicity) 



Variable 
(Subject) 



Dimension* 
I 2 



Dlm«nslon*^* 

1 2 3 

-20.912 -57.^60 -86.621 

25.113 -97.^1^ -9^.08iJ. 

33.171 -86.088 -90.8^9 

-50.695 -77.364 87.606 

30.577 -60.625 -67.418 

73.213 -85.313' -84.304 

56.596 -72.477 -84.451 

25.113 -97.417 -94.077 

-61.299 -61.977 -90.878 

38.289 -86.138 -100.000 

27.143 -43.454 -80.710 

16.945 -29.435 -84.099 

48.207 -79.245 -91.553 

63.836 -70.576 -88.902 

-100.000 -82.651 -83.109 

66.751 -83.409 -80.840 

15.307 -76.831 -81 .454 
56.596 -72.479 -84.441 

-47.854 -98.040 -76.593 

30.833 -66.993 -82.649 

-6.437 -86.377 -67.633 

15.308 -76.833 -81.448 
22.400 -74.375 -90.991 
99.074 -94,341 -83.519 

-40.152 -100.000 -82.019 

-66.238 -85.147 -79.799 

-50.698 -77.345 -87,523 

71.371 -63. 088 -83.121 

100.000 -91,891 -74.758 

35.035 -85.087 -71.4?3 

20.641 -53.233 -90.?83 



1 
2 

I 

5 
6 

7 

8 

9 

10 
11 
12 

il 

17 
1 p 



20 
21 
22 
2 
2 

25 
26 

27 
28 

29 
30 

31 



-21.393 
27.379 
33.240 
-50.811 

28.655 
72.142 

53.957 
27.390 
-61.852 

39.599 

24,459 

13.176 

47.337 
60.832 
-100.000 

66.193 
14.534 

53.980 

-49.625 
28.022 

- 8.782 
14.534 
21.690 

98.835 
-42.629 

-50,810 
66.934 

100.000 
34.721 
15i702 



-71.282 
-74.084 

-65.017 
-60.816 
-42.829 
.-61.083 
-55.576 
-74.121 
-72.602 
-68.416 
-33.732 
-100.000 
-59.902 
-54.958 
-63.828 
-64.123 
-56.334 
-55.710 
-4), 806 
-54.670 
-50.000 
-56.301 
-59.488 
-63.762 
-45.080 
-54.666 
-61,023 

-48.035 
-61.070 
-51.848 
-68.303 



* Kruskal's stress = .076I9 in 7 iterations. 
Guttman-Ilngoes* coefficient of alienation .00000 

Kruskal's stress = .01274 in 89 iterations, 
Guttman-Llngoes coefficient of alienation » .00000 
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BEST COPY AVAILABLE 



1-IULTIDIMSi'iS.XOWAL SCALING RSciULTS 
FOH 2, 3» AND DIM SN^J IONS 



Kruskal-Guttman-LinRoes-Hoskam Smallest Spaoe Coordinates 
for I' = ^ (Vfba'k nonotonlolty) 



Variable Dimension 

(Sub.ject) 1 2 3 _ 

-20.687 :r?Srrm -80.^173 ^^^957^2 

2 26.156 -99.989 -89.823 -93.B69 

3 3^^.056 -87.B86 -91. -89.9^0 

4 -51.37^f' -77.728 -81.850 -93.626 

31.523 -60.236 -76.3^8 -71.010 

'6 75.098 -86.680 -79.^3^^ -88.800 

7 )B.531 -73.115 -85.723 -86.009 

8 26.155 -100.000 -89.926 -93.7^1-^ 

9 -61.537 -62.700 -73.722 -100.000 

10 39.790 -88.931 -100.000 -91.950 

11 27.890 -i^2.8i|'3 -88.532 -81.611 

12 17.5^8 -28.036 -6i«'.703 -91.738 

13 i!'9.877 -81.73^1' -87.708 -9^.021 
Ik 6i},8^3 -71.061 -85.658 -96.00^1' 

15 -100.000 -85*255 -e^^Ms -93.155 

16 69.1^8 -88.9^^ -79. ^''^^ -82.251 

17 15.9^7 -77.590 -79.953 -85.395 

18 58.530 -73.119 -85.697 -86.005 

19 -il'9.872 -96.979 -89.753 -81. 769 

20 31.^81 -67.382 -82.112 -86.358 

21 .-7.008 -88. Sin -73.829 -72^3 

22 15.9^^ -77.590 -80.053 -85.326 

23 23.002 -'75»776 -86.58i^ -96.073 
2k 100.000 -93.268 -70.650 -99.151 

25 -M.900 -97.215 -97.775 -87.731 

26 .68.ii'83 -83.395 -83.550 -85.235 

27 -51.^15 -77.580 -82.100 -93.^0 

28 73. 80^^ -63.52^ -8^1'. 693 -88.976 

29 99.610 -89.709 -61. 7^1-1 "96.777 

30 35.933 -85.893 -78.3^2 -73.376 

31 21.162 -53.518 -90.792 -93.215 

Kruskal*s stress « ,01151 In ^0 iterations. 
Guttman-Lingoes ♦ coefficient of alienation - .00000 
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Appendix N 
GAMMA VALUES FOR CLUSTERING 
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